Blob storage systems like S3 offer a variety of storage tiers which allow users to trade storage cost (a function of the volume of data stored) off against access cost (a function of the number of API calls made). Today plugins like repository-s3 let you select a storage tier for all your snapshot data. However there are really two different kinds of blob in the repository:
Metadata blobs are small and we read them when taking snapshots, deleting snapshots, listing the repository contents, and otherwise manipulating the repository.
Data blobs hold the files that make up each snapshotted shard and therefore make up the bulk of the repository volume, but these are only read when restoring a shard or otherwise accessing its contents.
No one tier is best for all kinds of blob. I believe it would be better to keep metadata blobs in the tiers with lower access costs and data blobs in the tiers with lower storage costs. I think we should split the storage_class setting into data_storage_class and metadata_storage_class settings to achieve this.
Relates https://github.com/elastic/elasticsearch/issues/67790 because in fact if we supported wholly different storage locations for data and metadata blobs then we could make it much harder to fill up the repository in a way that blocks the metadata operations needed for deletes.
Blob storage systems like S3 offer a variety of storage tiers which allow users to trade storage cost (a function of the volume of data stored) off against access cost (a function of the number of API calls made). Today plugins like
repository-s3
let you select a storage tier for all your snapshot data. However there are really two different kinds of blob in the repository:Metadata blobs are small and we read them when taking snapshots, deleting snapshots, listing the repository contents, and otherwise manipulating the repository.
Data blobs hold the files that make up each snapshotted shard and therefore make up the bulk of the repository volume, but these are only read when restoring a shard or otherwise accessing its contents.
No one tier is best for all kinds of blob. I believe it would be better to keep metadata blobs in the tiers with lower access costs and data blobs in the tiers with lower storage costs. I think we should split the
storage_class
setting intodata_storage_class
andmetadata_storage_class
settings to achieve this.Relates https://github.com/elastic/elasticsearch/issues/81351 in that S3's Glacier Instant Retrieval seems ideal for data blobs but is wholly inappropriate for metadata blobs.
Relates https://github.com/elastic/elasticsearch/issues/67790 because in fact if we supported wholly different storage locations for data and metadata blobs then we could make it much harder to fill up the repository in a way that blocks the metadata operations needed for deletes.