Slow bulk updates - Githubissues

redserpent7 commented 7 years ago

Hi,

Recently I encountered an issue after migrating my cluster from 1.7 to 5.2. I have experienced extremely slow bulk updates to the extent where it took 1 week to update 1M files where in 1.7 those took minutes.

I have reported this on Elastic Discuss. And did not get any valid answer for why I am experiencing this kinf of slowness.

As you can see from the discussion thread, I have tried many things. Set number of replicas to 0, set refresh interval to -1, increased the CPU and memory of my nodes, increased the size of the cluster from 5 to 9 nodes and since the cluster was on AWS EC2 I changed the disk type to provisioned IOPS with 10000 IOPS. I also tried changing the bulk size from 100 to 500 to 1000 to 10000 and every increase made matters worse.

All of the above did not show any difference in the bulk update speed. Bulk inserts only took milliseconds so did update by query and search.

I was then forced to switch the code to pull the document, update the necessary fields on the server, then insert the updated document to ES. This sped up the process significantly as I managed to update 1M documents in 10 minutes. And I can increase this rate further as I did not see any changes to the CPU or Memory usage during those 10 minutes.

All this led me to believe that there is definitely some major flaws of the way ES handles bulk updates since the slowness did not make any sense and since doing the GET/UPDATE/INSERT should, in theory, take longer than using the update API.

es_monitor

Looking at the attached image, I do not see any significant changes in the indexing and search rates since I stopped using bulk updates which was on Mar 26th, as indicated by the red line.

I wish you can review the way bulk updates are currently being handled because even though I managed to avert that disaster I believe it should not be this way at all and it should be much faster than what I was experiencing.

Currenlty my ES cluster is on version 5.2 JVM: Java SE 1.8.0_121 OS: Ubuntu 14.04 64-bit Installed Plugins: analysis-icu, analysis-kuromoji, analysis-smartcn, discovery-ec2, repository-s3, x-pack

jasontedor commented 7 years ago

Let's keep the discussion on the forum as cross-posting just fractures discussions making them harder to follow. We can reopen this if there is a verified bug.

redserpent7 commented 7 years ago

@jasontedor If you'd like to. However I strongly believe this is a major bug with ES.

dizzzyroma commented 7 years ago

The same problem. Add comment to forum.

dmarkhas commented 7 years ago

@jasontedor there are multiple reports regarding this issue (the original thread mentioned here, as well as this one and this one, which is actually reproducing this behavior on Elastic Cloud), and there isn't much traction on the forum discussions which are being closed due to inactivity.

Can you advise how we can get this looked at? Is there any additional information we (users who are running into this issue) can collect and provide?

redserpent7 commented 7 years ago

@jasontedor its kind of alarming that no one from Elastic is considering this an issue. We are not talking about milliseconds delay differences been ES 5 and older versions. We are taking in minutes here, and in my case it was hours.

jasontedor commented 7 years ago

I think what you're all seeing is due to the fact that starting in 5.0.0 a refresh is forced if a get is performed since the last update to the document but the update has not been made visible to search yet. During an update request, a get is issued as part of executing the update operation. Therefore, this obviously has an impact on performing frequent updates to the same document. This is documented in the migration docs. I would encourage you to batch update operations to the same document on the client side.

redserpent7 commented 7 years ago

@jasontedor unless i am missing something, I do not see how this applies to my case in particular where only a single update is being issued to a document. And even then, refresh is relatively fast, only a couple of seconds on my clusterre, which does not explain why it would take a full hour to update 100 documents in bulk and a several milliseconds when pulling the document to another server updating some field values and re-inserting it and this also counts network transfer and latency times.

redserpent7 commented 7 years ago

@jasontedor and BTW my current resolution to the bulk update delays included pulling the document using the get API which according to the document you provided will issue a refresh on the cluster the same way bulk updates work. So i do not see why bulk updates would get delayed for several seconds while using get/insert takes no time at all

jasontedor commented 7 years ago

I see three distinct reports here (please forgive me if I'm missing any):

In the last two, it is clear that the issue is exactly what I mentioned: forced refresh. The users are updating the same document ID, and in your thread one of the same users (@dizzzyroma) provided a hot threads output that clearly shows refresh is the cause. I consider those two resolved.

For your issue, it appears that might indeed not be the case. You say that you're not updating the same ID, and your monitoring charts do not show the number of segments increasing rapidly. I asked you for hot threads or profiler output to further triage. Without that, it will be difficult for us to assess what is going on.

dmarkhas commented 7 years ago

@jasontedor , based on the migration docs, this behavior for the GET API can be disabled by passing realtime=false. Is it possible to implement this for the update API?

clintongormley commented 7 years ago

@jasontedor , based on the migration docs, this behavior for the GET API can be disabled by passing realtime=false. Is it possible to implement this for the update API?

No, because updates need to be sure they have the latest version of the doc to avoid losing changes

jrots commented 7 years ago

Semi related but seeing very slow bulk API update requests on 6.0... openend a topic here : https://discuss.elastic.co/t/slow-bulk-api-requests-es-6-0-beta2/100859 but really getting ridiculously slow update speeds..

ES is constantly busy with "warmer" requests when looking at hot_threads.. this is a blocking thing to use it in production for me.

ctrix commented 6 years ago

Hi all,

i'm runnning into the same issue with ES 6.2.3.

I'm using the UPDATE API to insert document into an index that i would like to rotate weekly. The cluster is made of 8 blades.

The data flow in from a kafka queue which contains daily deduped data out of which i craft the document to be sent to ES. During the first day, that is when the documents are mainly unique, i have brilliant performance, around 10K updates per second, using batches of 5000 documents and a few threads, i could probably push my hardware to bigger numbers but i don't need it.

After 24h, when some of the documents with the same ID comes in, the performance drops miserably to 500 documents per second, even less, then the CPU skyrockets and iowait sucks most of my resources. I tried changing the batch size, the threads, but no solution found.

I wanted to underline that you don't need duplicate documents in the same bulk request to trigger this ugly performance issue. You just need to update a few existing documents per request to make ES useless and unsuitable.

I hope that someone will look into this issue sooner or later ...

bleskes commented 6 years ago

@ctrix there are potentially some things you can do to sidestep the issue (at least it sounds like that from your description). If you open up a thread on discuss.elastic.co we can try to figure it out with you there.

ctrix commented 6 years ago

@bleskes thanks for the reply. For the records, i've posted my problem here where i hope to have a few helpful follow ups.

s1monw commented 6 years ago

FYI https://github.com/elastic/elasticsearch/pull/29264 might resolve this issue

elastic / elasticsearch

Slow bulk updates #23792