Open ppf2 opened 2 years ago
Pinging @elastic/es-docs (Team:Docs)
Pinging @elastic/es-distributed (Team:Distributed)
This is related to the undocumented quota aware file system limitation in Elasticsearch for Split Index API: https://github.com/elastic/elasticsearch/pull/88822
I've moved this to the allocation area because I think it'd be better to make these checks automatically rather than to simply document some complex formula that users may or may not heed.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html
On disk usage requirement, our current documentation simply mentions:
But it is possible for the target index to use more disk space than the size of the original index because splitting is basically cloning the shards and deleting documents from them, which means that there will be deleted documents for the Lucene merge process to clean up. Until these segments are organically merged over time (assuming that the index will continue to have indexing activity over time) or via a force merge, they can take up more space than the original index.
Also, splitting to a large number of shards means more disk space overhead (that increases with a larger # of shards) in order to accommodate shared structures like term dictionary across the shards.
While it may be difficult to provide a formula for exactly how much more disk space is required, it will be helpful to document the above caveats. Certainly, if there's a ballmark estimate we can provide for those who just want to be on the safe side, it will be helpful as well (e.g. will it be sufficient to accommodate all the unmerged segments and additional overhead if we recommend having 3 times the space of the original index)?
Thanks!