clyso / chorus

s3 multi provider data lifecycle management
Apache License 2.0
54 stars 6 forks source link

Example of configuration setup for a single bucket? #4

Open beddari opened 7 months ago

beddari commented 7 months ago

Hi there, we at Safespring were really intrigued when we saw this project released, it seems to solve a lot of the same use cases we also work with in our day-to-day operations. We have actually started to develop a very similar tool internally but you seem to have progressed further than we have, so far.

New to the project, what would be the easiest way to get a working replication setup for a single bucket s3->s3, in a standalone configuration?

Docker compose as the basis for the setup? Is there anything to go by that you have?

Looking forward to possibly collaborate!

arttor commented 7 months ago

Hi @beddari, thank you for your interest in our project. I will be happy to assist you and improve our documentation along the way!

The easiest setup for a try-out is the standalone binary. It contains all required components, including embedded Redis, making it easy to set up and install. However, it is not suitable for production because with the standalone option, you cannot have multiple workers or persist replication progress between restarts. In our organization, we use the Helm chart to run Chorus on K8s.

There is a docs for standalone setup but here is a short version (feel free to ask questions):

One of my next tasks is to create an example and docs for Docker-compose, but it is not done yet.

beddari commented 7 months ago

Thanks, this should be good to get us started, one of us will surely be in touch if we get stuck 👍🏼 The part that we couldn't figure out quickly ourselves was the cli tool example.

arttor commented 7 months ago

one more thing that may be useful: if you are familiar with Go, you can run Chorus e2e tests against your s3 storages with debug, breakpoints, and stuff from your IDE. To do so, you need:

  1. Install Go
  2. go to e2e migration test test/migration/
  3. edit proxy and worker s3 storage config in test init code test/migration/init_test.go. Currently, test starts 3 fake s3 storages main, f1, and f2 and puts url and credentials to config, but you can remove fake s3 storages and just add your credentials into config Go struct proxyConf.Storage.Storages["main"], proxyConf.Storage.Storages["f1"], etc...
  4. Then you can create a new test, or edit the existing one. The test is just a Go script that puts some data to src s3 with s3 go client and communicates with chorus directly over gRPC (instead of cli) to run replication and verify the results.

This is a little bit low-level and more relevant for developers but if you are fluent with Go, it might be easier to build all from source to run on your machine and use gRPC instead of CLI .