elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.94k stars 24.74k forks source link

Elasticsearch data corruption after disk expention #15128

Closed Idofriedman closed 8 years ago

Idofriedman commented 8 years ago

When expanding an existing disk on a machine, (ec2) elasticsearch does not work correctly with the allocated space.

It does recognize that the new space is available and allocates shards to the node BUT once the shards get there there the below error is reported:

[2015-11-30 00:01:49,502][WARN ][cluster.action.shard ] [ES01] [indexname][18] received shard failed for [indexname][18], node[xvIIzsByRt-Ku-t-wLSKyA], relocating [5QhU1g2rQDK-AqTyUMmYMw], [P], s[INITIALIZING], indexUUID [rrS6H6hHTkixYnmTv3th2Q], reason [shard failure [failed recovery][RecoveryFailedException[[indexname][18]: Recovery failed from [ES08][5QhU1g2rQDK-AqTyUMmYMw][PreProd-ElasticSearch-m4.xl-ES08][inet[/XXX.XX.X.XXX:9300]]{master=true} into [ES12][xvIIzsByRt-Ku-t-wLSKyA][ip-XXX-XX-X-XXX][inet[/XXX.XX.XX.XXX:9300]]{master=true}]; nested: RemoteTransportException[[ES08][inet[/XXX.XX.X.XXX:9300]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[indexname][18] Phase[1] Execution failed]; nested: RecoverFilesRecoveryException[[indexname][18] Failed to transfer [127] files with total size of [5gb]]; nested: RemoteTransportException[[ES12][inet[/XXX.XX.X.XXX:9300]][internal:index/shard/recovery/file_chunk]]; nested: FileNotFoundException[/es_data/ebs/PreProd/nodes/0/indices/indexname/18/index/recovery.1448841041953.segments_1x (No space left on device)]; ]

The error is reported although there is lots of it and it is updated in in node stats as well.

The issue is resolved in most cases by rebooting the node, this updates ES on the new space in a more complete way and the allocation succeeds.

The big problem here is that in most cases we had corruption on some of the data.

10x

clintongormley commented 8 years ago

What version of Elasticsearch are you on? Are you using multiple data paths per node? Also, this exception looks like it comes from the OS, not from Elasticsearch:

No space left on device
Idofriedman commented 8 years ago

ES v1.7.3 One data path

clintongormley commented 8 years ago

No further info. Closing