elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.74k stars 24.42k forks source link

Support distinct storage tiers for snapshot metadata blobs vs index data blobs #81352

Open DaveCTurner opened 2 years ago

DaveCTurner commented 2 years ago

Blob storage systems like S3 offer a variety of storage tiers which allow users to trade storage cost (a function of the volume of data stored) off against access cost (a function of the number of API calls made). Today plugins like repository-s3 let you select a storage tier for all your snapshot data. However there are really two different kinds of blob in the repository:

No one tier is best for all kinds of blob. I believe it would be better to keep metadata blobs in the tiers with lower access costs and data blobs in the tiers with lower storage costs. I think we should split the storage_class setting into data_storage_class and metadata_storage_class settings to achieve this.

Relates https://github.com/elastic/elasticsearch/issues/81351 in that S3's Glacier Instant Retrieval seems ideal for data blobs but is wholly inappropriate for metadata blobs.

Relates https://github.com/elastic/elasticsearch/issues/67790 because in fact if we supported wholly different storage locations for data and metadata blobs then we could make it much harder to fill up the repository in a way that blocks the metadata operations needed for deletes.

elasticmachine commented 2 years ago

Pinging @elastic/es-distributed (Team:Distributed)