elasticsearch sometimes starts consuming a lot of CPU indefinitely

elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

https://www.elastic.co/products/elasticsearch

Other

69.78k stars 24.69k forks source link

elasticsearch sometimes starts consuming a lot of CPU indefinitely #10437

Closed erjiang closed 9 years ago

erjiang commented 9 years ago

I am running a logs server on AWS. The total CPU usage is typically 3-5%.

Sometimes, elasticsearch starts taking up much more CPU and stays at that level until I SSH into the server and restart it.

This below image shows what typically happens. At an arbitrary time, CPU load dramatically increases (even though there is no corresponding increase in new data) and stays there until I log in to the server and restart elasticsearch.

elasticsearch

I am running 1.5.0 on Ubuntu 14.04 using the official repository package.

Please let me know what additional information I should provide and how to get that information.

kimchy commented 9 years ago

when the server is under high CPU load, can you first check if its the ES process, and if so, can you issue the hot threads api and post back the response of it here? http://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html

erjiang commented 9 years ago

Here is the output of hot_threads after elasticsearch started consuming extra CPU again. It is now consistently using approx 40-60% CPU up from 4-8% CPU.

::: [MODAM][tB-KGWlsQ4KDt1PUMleoyQ][localhost][inet[/10.0.3.26:9300]]
   Hot threads at 2015-04-07T13:40:51.020Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

    0.0% (133.7micros out of 500ms) cpu usage by thread 'elasticsearch[MODAM][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

Running it again, I get:

::: [MODAM][tB-KGWlsQ4KDt1PUMleoyQ][localhost][inet[/10.0.3.26:9300]]
   Hot threads at 2015-04-07T13:43:43.961Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

Please let me know what other information I can provide.

clintongormley commented 9 years ago

Hi @erjiang

Are you seeing any logs in syslog about "riding the rocket"? See https://github.com/elastic/elasticsearch/issues/10447#issuecomment-90295492

erjiang commented 9 years ago

We didn't see the "riding the rocket" message.

So far our best guess is that elasticsearch used up its available Java heap space, causing GC to continually run and consume cycles. That would explain why the problem would occur after a day or two and go away after restarting elasticsearch.

clintongormley commented 9 years ago

OK, so sounds like memory pressure. Are you using a lot of fielddata (ie per-doc field values loaded for sorting, aggs, or scripting)? You can check with:

GET /_nodes/stats/indices/fielddata?fields=*&human&pretty

If so, consider switching those fields to use doc_values: true (which you can only do on a new index) to shift the memory use from your heap to the file system cache. This will be the new default in 2.0.

jpountz commented 9 years ago

Closing due to lack of feedback

pjcard commented 9 years ago

I too saw this issue, with HashedWheelTimer appearing as the only process in hot threads. I've restarted my server (rather than process), and I'll report back as to whether it recurs.

mausch commented 9 years ago

In my case, something similar was happening because I had too many indices open. Closing old indices with Curator stopped that behaviour. https://www.elastic.co/blog/curator-tending-your-time-series-indices

andrenarchy commented 8 years ago

I also experienced a constant cpu load without any activity on the ES cluster. Increasing ES_HEAP_SIZE to half of my system's memory fixed it. @erjiang: Thanks for the hint!

Is there a way how ES can detect such a situation? It would be very helpful if a warning would appear in the log messages.

jasontedor commented 8 years ago

Is there a way how ES can detect such a situation? It would be very helpful if a warning would appear in the log messages.

I opened #18419.

maoxiajun commented 7 years ago

26.2% (131ms out of 500ms) cpu usage by thread 'elasticsearch[xxx][search][T#62]' 10/10 snapshots sharing following 10 elements sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737) java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647) java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269) org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)

I can see this when use 'curl localhost:9200/_nodes/hot_threads'

TARENDRA1994 commented 2 years ago

Hi,

The AWS elastic search went high cpu utilization for 1 hours i.e 1.30IST to 2.30 IST may i know or how can i check what was the problem i have nothing found any issue inside error logs.