gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.65k stars 220 forks source link

Create a middleware to change the storageclass #625

Open michaelcourcy opened 2 months ago

michaelcourcy commented 2 months ago

Some tools are configured to write to s3 with no option to configure the storageclass. This feature request will let those tools use the proxy with default or no storageclass and the proxy will write to the proxied s3 with a target storageclass.

We need that because some tools predates the creation of the storageclass feature brought up by AWS. We also have case where these tools has this option for AWS but not for S3 compatible, however minio or ceph let you build s3 solution with different storage class.

The natural approach would be to change those tools to add the storageclass information in their PUT request but often this feature is not prioritised by the development team for different reasons like upgrading their library create too many frictions with their existing codebase.

Here an example of how one would use this middleware

s3proxy.storageclass.default=STANDARD_IA
s3proxy.storageclass.STANDARD=STANDARD_IA

this configuration make sure that no information on the storageclass (default) or the information STANDARD storageclass is provided then s3proxy will write to the STANDARD_IA storageclass.

In this situation

s3proxy.storageclass.GLACIER=HDD_SC

Any request made for putting the object on GLACIER will be translated in HDD_SC.

This request should also work the other way, any GET request that return HDD_SC should be transformed in GLACIER .

gaul commented 2 months ago

This is a good feature request and something that is straightforward for some common use cases. S3Proxy uses Apache jclouds to enable portability between object stores. S3 has more storage tiers than jclouds as this mapping shows:

   public enum StorageClass {
      STANDARD(Tier.STANDARD),
      STANDARD_IA(Tier.INFREQUENT),
      ONEZONE_IA(Tier.INFREQUENT),
      INTELLIGENT_TIERING(Tier.STANDARD),
      REDUCED_REDUNDANCY(Tier.STANDARD),
      GLACIER(Tier.ARCHIVE),
      GLACIER_IR(Tier.ARCHIVE),
      DEEP_ARCHIVE(Tier.ARCHIVE);
   }

Azure support a Cold tier and GCS supports Coldline so adding one more level like Tier.MORE_INFREQUENT is obvious (although a terrible name).

But for the pure S3 use case, supporting all the storage classes would require adding non-portable logic to putBlob. We should also consider whether getBlob needs non-portable treatment or we can ignore a lost of fidelity.

michaelcourcy commented 2 months ago

But for the pure S3 use case, supporting all the storage classes would require adding non-portable logic to putBlob

In CEPH you can create arbitrary name for the storageclass when you build your zone group placement even if it is not a good idea in my opinion.

gaul commented 2 months ago

Yeah I think this enum was a mistake and supporting arbitrary strings would be better for OpenStack Swift and future use cases. Let me see how easy it is to at least allow creating with an arbitrary storage class.