Open Jayd603 opened 2 years ago
This is a reasonable feature request and something the startup Bounce Storage implemented. It is easy to make S3Proxy write to multiple backends, at least when all the putBlob requests succeed. However it is more difficult to create policies for reading from a backend. Does it only read from one of them? Does it try to read from one and fail over to the second? What happens when writes fail to only one of the backends?
@gaul our use case is having a filesystem endpoint for realtime data and s3 endpoint for archive data. Our optimizer service is moving realtime data to s3 endpoint in batches, like every 15 minutes or every 200MBs. In order to merge these two paths, we need a single endpoint with multiple backends.
One suggestion would be a single endpoint port but different paths, for example:
http://endpointname:8585/realtime -> file system backend http://endpointname:8585/archive-> s3 backend
or
http://endpointname:8585/s3-> s3 backend http://endpointname:8585/blob-> azure blob storage backend
Is this technically possible for S3 APIs? I believe as a simple api gateway, this can be possible, but maybe S3 API works different?
Is this currently possible? I see I can shard across virtual buckets which is nice but it's not clear on if it would allow sharding across multiple backends (in this case it would be identical backends using the file system). That feature would be great.
Although now that I think about it - technically it would be the same backend type (file system) but if i could at least choose different backend paths for the sharding that could work.