Description of the problem including expected versus actual behavior:
Periodically, an elasticsearch instance running on some windows systems experience a sudden spike in CPU activity in the elasticsearch process and the indexing rate slows. Here is a Java Mission Control flight recording that includes such a period: https://dl.dropboxusercontent.com/u/90795372/flight_recording_Elasticsearch4000.jfr
The configuration for the affected instances uses the default value for index.store.type.
We suspect that the problem is related to other processes on the system interacting with the WMI system, because when these systems are stopped the CPU spikes stop occurring.
Steps to reproduce:
Install elasticsearch 1.5.2 or 2.3.4 on a windows and apply indexing load
Initiate another process to interact with the WMI system
Observe elasticsearch process CPU periodically increase dramatically and indexing rate slow
Provide logs (if relevant):
In the attached log usaseclm01.log.txt, the user reported two periods where CPU activity spiked to 100%: between 16:35 and 16:44, then again between 16:55 and 17:08. During both of these windows, elasticsearch appears to have been performing 20ish minute segment merges and was otherwise unresponsive. We know this because during normal activity, a set of indices is deleted and re-created every 5 minutes (this can be seen elsewhere in the log file).
Elasticsearch version: 1.5.2 and 2.3.4
JVM version: 1.8.0_92
OS version: Windows 2012
Description of the problem including expected versus actual behavior: Periodically, an elasticsearch instance running on some windows systems experience a sudden spike in CPU activity in the elasticsearch process and the indexing rate slows. Here is a Java Mission Control flight recording that includes such a period: https://dl.dropboxusercontent.com/u/90795372/flight_recording_Elasticsearch4000.jfr
The configuration for the affected instances uses the default value for
index.store.type
.Using information from the flight recording, we found a netty issue that might be relevant: https://github.com/netty/netty/issues/3857
We suspect that the problem is related to other processes on the system interacting with the WMI system, because when these systems are stopped the CPU spikes stop occurring.
Steps to reproduce:
Provide logs (if relevant): In the attached log usaseclm01.log.txt, the user reported two periods where CPU activity spiked to 100%: between 16:35 and 16:44, then again between 16:55 and 17:08. During both of these windows, elasticsearch appears to have been performing 20ish minute segment merges and was otherwise unresponsive. We know this because during normal activity, a set of indices is deleted and re-created every 5 minutes (this can be seen elsewhere in the log file).
usaseclm01.log.txt