elastic / elasticsearch-cloud-aws

AWS Cloud Plugin for Elasticsearch
https://github.com/elastic/elasticsearch/tree/master/plugins/discovery-ec2
577 stars 181 forks source link

Repeated "Read timed out" errors when recovering a large sized shards from S3 repository #157

Open cregev opened 9 years ago

cregev commented 9 years ago

When im trying to restore a large size index (750GB splitted to 6 shards) from S3, "Read timed out" errors are raised , and the restore process does not finish the operation it looks like nothing is happening ...

Details about our Elasticsearch Cluster:

Es Version 1.4.2 Aws Cloud Plugin - 2.4.1

[2014-12-29 00:51:20,706][WARN ][indices.cluster ] [es-test-hist01] [2014_11][1] failed to start shard org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [2014_11][1] failed recovery at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [2014_11][1] restore failed at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:130) at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:127) ... 3 more Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [2014_11][1] failed to restore snapshot [es-test-transfer] at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:165) at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService.restore(IndexShardSnapshotAndRestoreService.java:124) ... 4 more Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: [2014_11][1] Failed to recover index at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:787) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository.restore(BlobStoreIndexShardRepository.java:162) ... 5 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:577) at sun.security.ssl.InputRecord.read(InputRecord.java:532) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:911) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204) at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71) at org.elasticsearch.index.snapshots.blobstore.SlicedInputStream.read(SlicedInputStream.java:92) at java.io.InputStream.read(InputStream.java:101) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restoreFile(BlobStoreIndexShardRepository.java:834) at org.elasticsearch.index.snapshots.blobstore.BlobStoreIndexShardRepository$RestoreContext.restore(BlobStoreIndexShardRepository.java:784) ... 6 more [2014-12-29 00:51:20,734][WARN ][cluster.action.shard ] [es-test-hist01] [2014_11][1] sending failed shard for [2014_11][1], node[BU9hbOrJSnmdASggfIzEEg], [P], restoring[my_s3_repository:es-test-transfer], s[INITIALIZING], indexUUID [k9lXKiIDQGe-2zqPBsP78w], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[2014_11][1] failed recovery]; nested: IndexShardRestoreFailedException[[2014_11][1] restore failed]; nested: IndexShardRestoreFailedException[[2014_11][1] failed to restore snapshot [es-test-transfer]]; nested: IndexShardRestoreFailedException[[2014_11][1] Failed to recover index]; nested: SocketTimeoutException[Read timed out]; ]] [2014-12-29 01:04:47,413][WARN ][indices.cluster ] [es-test-hist01] [2014_11][3] failed to start shard :1

Please Advise ?

Thanks, Costya.