ES 5.2.2 Data nodes heap suddenly jump to max, causing cluster crash

itaydvir commented 7 years ago

Elasticsearch version: 5.2.2

Plugins installed: [analysis-kuromoji, analysis-smartcn]

JVM version: 1.8.0_65

OS version: Debian 3.16.36

Description of the problem including expected versus actual behavior: ES 5.2.2 Data nodes heap suddenly jump to max (from 6GB to 15GB) causing cluster crash. Cluster includes: 1) 3 master nodes 2) 5 data nodes (32GB RAM, 8 cores) 3) indices are not that big, most used ones contains 1.5 Mil docs, and take 4GB store size indices used for aggs contains about 6Mil docs and weight about 15GB.

Our usage is pretty basic, mainly textual search, some aggregations (sum, avg, and some contains scripts). very very low usage of nested documents and no usage of parent/child relationships.

Please read full issue description + graphs images here: https://discuss.elastic.co/t/upgrading-es-2-3-3-to-5-2-causing-cluster-crash.

P.S. It was a blocker for us, and after several attempts we decided to do rollback to 2.3.3. so for now, we are not running 5.2 anymore.

ywelsch commented 7 years ago

We need a heap dump here to see what's going on.

itaydvir commented 7 years ago

Dont have at the moment... I currently rolled back to 2.3.3 in the next couple of weeks i will build a parallel cluster with 5.2.2 in order to recreate the issue without affecting production services.

jasontedor commented 7 years ago

There is nothing that we can do here without a heap dump nor a reproduction. Please open a new issue if you are able to reproduce this.

elastic / elasticsearch

ES 5.2.2 Data nodes heap suddenly jump to max, causing cluster crash #23868