gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.8k stars 232 forks source link

Middleware - sharding to multiple backends #470

Open Jayd603 opened 2 years ago

Jayd603 commented 2 years ago

Is this currently possible? I see I can shard across virtual buckets which is nice but it's not clear on if it would allow sharding across multiple backends (in this case it would be identical backends using the file system). That feature would be great.

Although now that I think about it - technically it would be the same backend type (file system) but if i could at least choose different backend paths for the sharding that could work.

gaul commented 1 year ago

This is a reasonable feature request and something the startup Bounce Storage implemented. It is easy to make S3Proxy write to multiple backends, at least when all the putBlob requests succeed. However it is more difficult to create policies for reading from a backend. Does it only read from one of them? Does it try to read from one and fail over to the second? What happens when writes fail to only one of the backends?

yusufozturk commented 3 months ago

@gaul our use case is having a filesystem endpoint for realtime data and s3 endpoint for archive data. Our optimizer service is moving realtime data to s3 endpoint in batches, like every 15 minutes or every 200MBs. In order to merge these two paths, we need a single endpoint with multiple backends.

One suggestion would be a single endpoint port but different paths, for example:

http://endpointname:8585/realtime -> file system backend http://endpointname:8585/archive-> s3 backend

or

http://endpointname:8585/s3-> s3 backend http://endpointname:8585/blob-> azure blob storage backend

Is this technically possible for S3 APIs? I believe as a simple api gateway, this can be possible, but maybe S3 API works different?