danilop / yas3fs

YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE) interface to Amazon S3. It was inspired by s3fs but rewritten from scratch to implement a distributed cache synchronized by Amazon SNS notifications. A web console is provided to easily monitor the nodes of a cluster.
http://danilop.github.io/yas3fs
MIT License
644 stars 98 forks source link

Support native S3 events #147

Closed timor-raiman closed 7 years ago

timor-raiman commented 7 years ago

Closes #146. With this PR, it is possible to mount an S3 bucket that's being modified in parallel from outside of yas3fs. This is achieved by configuring the bucket to send automatic events to the SQS queue which is monitored by yas3fs. In this PR, we identify events caused by yas3fs itself via the user_id of the event initiator. This restricts the solution to have the same AWS user run yas3fs on all nodes which participate in a cluster where native S3 events are configured.

liath commented 7 years ago

Unless I'm mistaken, this will be a breaking change. Everyone will need to add S3 events -> SQS now to their buckets. Perhaps put a command line flag in front of this?

timor-raiman commented 7 years ago

Nope, this shouldn't require every one to enable S3 events. It will handle them only if they begin to appear on the queue. Existing yas3fs message handling is not changed.

liath commented 7 years ago

Looks good to me. I'll try to do a test run today.

liath commented 7 years ago

Everything seems to work. Though it begs the question of why not drop all the existing message stuff and switch completely to native s3 messages. It would simplify the codebase a little. I guess having separate messages would be useful for the web-console idea. Though right now the only message I see this being used for that isn't S3 related is publish_status.

Let's see if we get some more input before merging this.

timor-raiman commented 7 years ago

The existing messages support a super-set of operations, including metadata changes, etc. Also, the existing messages provide a way to differentiate between events coming from different nodes, which would not be possible with S3 native messages only, since the only identification they carry is the user_id of the initiator, and all nodes may run as the same user.

liath commented 7 years ago

This looks good to me and ran fine when I tested it, could I trouble y'all to give it a look before I merge? Thanks :3

danilop commented 7 years ago

I didn't have time to test this, but it looks amazing, great job!