elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.47k stars 24.59k forks source link

Snapshot repository registration api call causing "out of memory" errors #10344

Closed nicktgr15 closed 9 years ago

nicktgr15 commented 9 years ago

We are running an elasticsearch cluster with 27 nodes and we create about 200 new indices per day. We keep data for the last 5 days, so in total we have about 1000 indices. Every day we take a snapshot of yesterday's 200 indices in S3 (so, no incremental backups).

After updating to 1.4.2 we've noticed that when we try to register a snapshots repository we end up with the master node running out of heap space and the cluster going into an unresponsive state. I'm attaching a screenshot where you can see the heap space usage on the master node after making a PUT request to register a snapshots repository.

jmx-master

After inspecting a heap dump taken from the master node we realised that it's trying to list the contents of our s3 repository. At the moment we keep all previous snapshots in our s3 repository which means that it's impossible (in terms of time and resources) to list everything. I'm attaching a screenshot from Memory Analyser where you can see that 51% of the heap space (800 MB) is occupied by a Map storing 5.000.000 entries with PlainBlobMetadata objects as values and s3 locations as keys.

memoryanalyser_tree

We recently updated to 1.4.2. (from 1.1.2) and we don't think we've seen a similar behaviour (i.e. listing s3 repository contents) in 1.1.2. Could be an issue that needs further investigation on your side or could be the way we are using the snapshots service at the moment that is causing us problems?

Any suggestions/feedback would be welcome.

Thank you

imotov commented 9 years ago

@nicktgr15 could you post the stack trace for the OOM error?

tlrx commented 9 years ago

we create about 200 new indices per day

How many number of shards represent those 200 indices? What's the total size?

nicktgr15 commented 9 years ago

Thanks for the responses!

@imotov I'm currently reproducing the issue in order to get a full stack trace which I will attach to the ticket as soon as I have it

@tlrx There are 7 primary shards and 7 replicas per index, so 200 indices are represented by 2800 shards. The total number of shards (for the 5 days retention policy) is 14000. The total size in bytes for the 200 indices created per day is close to 400GB (including replicas).

tlrx commented 9 years ago

@nicktgr15 thanks for your quick response

We are suspecting that the verification process failed in your case. Can you please try to register the repository but without verification? You have to set verify: false when registering the repo, see the documentation here

nicktgr15 commented 9 years ago

I'm attaching a screenshot from jConsole showing the heap space utilisation in both cases with verify: true (default) and verify: false. For the two cases we got the following output:

verify: true (default)

curl -XPUT 'http://localhost:9200/_snapshot/s3_repository' -d '{
>     "type": "s3",
>     "settings":{"region":"eu-west-1","base_path":"pyxis/test/snapshots","max_restore_bytes_per_sec":"100mb","bucket":"prod-s3-common-storagebucket-sl9vplg1o48d"}
> }'
{"error":"OutOfMemoryError[Java heap space]","status":500}
[2015-03-31 15:43:25,265][INFO ][repositories             ] [Master Menace] update repository [s3_repository]
[2015-03-31 15:53:11,375][WARN ][repositories             ] [Master Menace] [s3_repository] failed to finish repository verification

verify: false

curl -XPUT 'http://localhost:9200/_snapshot/s3_repository' -d '{
>     "type": "s3",
>     "settings":{"region":"eu-west-1","base_path":"pyxis/test/snapshots","max_restore_bytes_per_sec":"100mb","bucket":"prod-s3-common-storagebucket-sl9vplg1o48d", "verify": false}
> }'
{"error":"AmazonClientException[Unable to unmarshall response (Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler). Response Code: 200, Response Text: OK]; nested: AmazonClientException[Failed to sanitize XML document destined for handler c
[2015-03-31 15:56:21,785][INFO ][repositories             ] [Master Menace] update repository [s3_repository]
[2015-03-31 16:05:33,932][WARN ][repositories             ] [Master Menace] [s3_repository] failed to finish repository verification

repository-verify

We reduced the heap space to 300m in order to trigger the out of memory error sooner. As you can see in the first block there is an "out of memory" error but there was no stack trace.

This was not reproduced on Live environment and load conditions were not the same.

tlrx commented 9 years ago

@nicktgr15 thanks for the information. I think I can now imagine what happened in your cluster.

In elasticsearch 1.4 we added repository verification. When registering a repository, each node of the cluster tries to write a file in the repository before the repository registration is acknowledged.

The current implementation of the verification process lists all the files at the root level of the repository... and in your case it represents too much data for the Java XML API so the node hits an OOM.

First thing, I've created the pull request #10366 to avoid listing all files when verifying a repository.

Second thing, the is an error in the documentation. To disable repository verification you need to add the URL parameter verify=false when registering the repository. I've created the pull request #10365 to update the doc, but you can now try to register the repository with this parameter.

Last thing: in your case there are good chances that disabling the repository verification will just differ the OOM. The snapshot process needs to list all files of the repository to be able to make an incremental backup. The upcoming #8969 will improve the creation and deletion of snapshot in use cases like yours, but you should consider to create less indices (200 indices with 7 shards = 1400 shards, 400GB / 1400 =~142MB per shard where a shard could handle GBs of data)

You can also consider to create multiple repositories, like snapshotting newly indices in a new repo every day.

nicktgr15 commented 9 years ago

Thanks for the update @tlrx. As a temporary workaround we will be rotating the snapshots repository on a monthly basis. We didn't have a chance to try again with verify: false as a url parameter, if we do, I will provide an update.

Regarding your suggestion about the number of shards in the cluster, is there a performance impact on the snapshotting process because of this?

In general, we know that there is a suggested 10GB upper limit for the shard size and based on what you describe, we should aim for a minimum of 1GB. Does this sound like the right strategy to optimise the number of shards we currently have in the cluster? At this point we can't reduce the number of indices as it is a requirement the system should satisfy.

tlrx commented 9 years ago

Regarding your suggestion about the number of shards in the cluster, is there a performance impact on the snapshotting process because of this?

Definitely. The snapshot process iterates over each file of the shards to process and upload them. The more shards you have, the more files need to be snapshotted.

Does this sound like the right strategy to optimise the number of shards we currently have in the cluster?

Hard to tell if it's the right strategy without knowing exactly what is your use case. But for what I know of your cluster I feel like you should benefit of creating less but larger indices.

At this point we can't reduce the number of indices as it is a requirement the system should satisfy.

I'm curious to understand why the number of indices is a requirement for your system?

nicktgr15 commented 9 years ago

Was a requirement when the system was designed as it was simplifying the way we managed clients data (e.g. backups per index, permissions). However, we are going to review our current approach as the high number of indices seems to be an overhead. Thank you for you help.

tlrx commented 9 years ago

@nicktgr15 ok thanks, that's what I suspected.

You may be interested in the chapter "Designing for Scale" of the book Elasticsearch Definitive Guide. It's a great chapter that deals with use case like yours.

I'm closing this issue since many pull requests are already engaged. Feel free to reopen if needed.