S3 Snapshots become very slow with more existing snapshots

ankon commented 9 years ago

I did some tests with the S3 snapshot facility, to see how well they will work in a production environment. Initially things looked nice on our data set (about 500 shards, all quite small). However, after only about 2 days of hourly snapshots the time it took to complete a snapshot shot up from initially ~160s to ~300s, and then after 7 days the time was at 1700s.

While this might be improvable by removing existing snapshots over time, it seriously limits the ability to use snapshots as backups over long time.

Looking through the code: the issue seems to stem from the fact that the S3 snapshots use the Blob storage format, which requires being able to read many files to recover the full meta data. On a file system this might be "ok" for a long time, but on S3 accessing each of those files means API requests and network use.

I can see a short-time hack to improve this: in addition to the small meta data files also keep an aggregated version, which gets updated when new snapshots are created. When accessing a single blob the aggregate could be checked first, and if it does exist the information is take from there. This should be reasonably backwards-compatible, but will likely be very ugly and potentially have issues when the blob storage format introduces new meta data files.

In the long run it seems it would make more sense to stop using the blob storage format, and build something that uses S3's strengths better, and avoids the weaknesses. I'm not sure yet about how this could look like though :)

So, some questions:

What do you think? Has this appeared before? Is any of this on the roadmap?
Would it make sense to try coming up with patches for both the short-time and the long-time ideas?
How are S3 snapshots intended to be used?

ankon commented 9 years ago

Outcome of a quick discussion with @bleskes:

Being able to do a 1:1 copy from a local snapshot to a S3 snapshot isn't necessarily needed, so the formats could differ.
Being able to read "old" snapshots is required.
Being able to continue writing "old" snapshots is very much desirable
Being able to do an in-place upgrade of a repository to a different storage format is nice-to-have, but not required.
Changing the internal API of ES should be avoided, although backwards compatible changes (like pulling in another layer) could be ok.

Both the short-term and long-term approach would be doable with those requirements from what I can see. At the moment this issue is semi-critical for us, but I can see it becoming an issue in the mid-term. At that point I'll likely have a closer look how to implement the long-term approach (if no one beats me to it :D).

tlrx commented 9 years ago

@ankon sorry, I just see this issue now... This issue is well known and is not specific to S3. There is an improvement about to be merged #8969 and it is very similar to what you suggested here.

dadoonet commented 9 years ago

@tlrx so should we close the issue here?

ankon commented 8 years ago

Some explicit links:

https://github.com/elastic/elasticsearch/issues/8958 is the issue mentioned above, with the PR merged into v2.0.0-beta1
150 is pretty much the same thing, albeit on the deletion side.

Update from my point: I cannot update to 2.0.0, so I'll instead use a different strategy to manage snapshots -- probably by actually rolling the repositories, and then selectively wiping those directly.

elastic / elasticsearch-cloud-aws

S3 Snapshots become very slow with more existing snapshots #174

150 is pretty much the same thing, albeit on the deletion side.