clyso / chorus

s3 multi provider data lifecycle management
Apache License 2.0
54 stars 6 forks source link

How to: Agent with standalone setup #10

Closed zap51 closed 3 months ago

zap51 commented 4 months ago

Hello, First of all, thank you so much for this project. I've been spending a lot of time finding out tools for S3 migration. We mostly use Ceph and I'm currently doing a massive transfer around 2.9 billion objects of around 500 TiB using rclone. rclone works quite well but it is very compute intensive because of the LIST operations and the object change tracking is difficult.

Came across this project which is very promising. For me the only missing piece is to track the real-time changes on a bucket without running rclone again from the start, LIST, compare and stuff. Been working on this for like 2 hours at the time of writing this.

Coming to the question, I don't want to put Chorus proxy in the path yet due to performance reasons (untested at the scale I'm dealing with). I'm thinking if I could make use of the agent service instead of the proxy by making use of bucket notifications. The documentation says that it is possible but the Chorus event notification HTTP endpoint should be reachable which I can do. I just applied this in my standalone setup but can't seem to get this working.

My repo is here which contains the needed config. In addition, I've put the below (which is not in the repo)

port: 9673
url: "http://172.19.1.171:9673"
fromStorage: "ceph--1"

and the below doesn't give any output. Please let me know if the standalone one works with agent.

# chorctl agent
FROM_STORAGE PUSH_URL

Looks the port is also not bound

# ss -tulpn | grep 9673
root@chorus-standalone:~# 

I'm preparing for the production deployment as well :)

Thanks, Jayanth

arttor commented 4 months ago

hi, @zap51 thank you for your interest in the project. The documentation part of the project is a little bit raw and i am happy that we can improve it.

chorus-worker and chorus-agent using redis for communication. If you are using standalone chorus distribution, then it starts embedded, non-persistent redis on random free port. Redis credentials are printed to standalone log. It was 45917 in your case:

Redis URL:  127.0.0.1:45917

So this redis credentials should be used in agent config.yaml:

port: 9673 
url: "http://172.19.1.171:9673"
fromStorage: "ceph--1"
redis:
  address: "127.0.0.1:45917"

Then start agent with config overrides:

agent -config config.yaml

Now, if you call standalone with chorctl agent, you should be able to see it in the list:

chorctl agent
FROM_STORAGE PUSH_URL
one http://localhost:9673

I've just tested it locally with the latest chorus version.

PS It is strange that agent was able to start without overriding redis config. Probably you have other redis instance on default port 6379

PPS Please don't use standalone in production because standalone version stores all data in-memory and everything will be lost on app restart. Consider using dedicated redis server with enabled persistence. Then install chorus with helm ~or point standalone~ and agent configs to this dedicated redis instance.

EDITED: it is not possible to use external redis with standalone distribution. For production you can just install woker and agent with helm or docker or systemd and point it to external redis.

zap51 commented 4 months ago

Thanks @arttor. I'll try your suggestions out and come back.

Regards, Jayanth

zap51 commented 3 months ago

@arttor Sorry, I took some time to come back here. We've started using docker-compose method (for production and is persistent) and evaluating. Thanks, closing this now :)