elastic / elasticsearch-migration

This plugin will help you to check whether you can upgrade directly to the next major version of Elasticsearch, or whether you need to make changes to your data and cluster before doing so.
290 stars 32 forks source link

Migration reindex helper fails when underlying reindex task appears to have succeeded #71

Closed ppf2 closed 7 years ago

ppf2 commented 7 years ago

Have a .watches index from ES 1.7 + Watcher 1.0.1. This checks out fine using the migration plugin on ES 1.7 (i.e. ok to upgrade to ES 2.x without changes). Once on ES 2.x, when attempting to reindex this index using the 2.x migration plugin (to prepare for 5.x upgrade), the reindexing job from the migration plugin fails:

image

[2016-10-12 20:13:25,474][INFO ][cluster.metadata         ] [Gibbon] [.watches-2.4.0] creating index, cause [api], templates [], shards [1]/[0], mappings [watch]
[2016-10-12 20:13:25,534][INFO ][cluster.routing.allocation] [Gibbon] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.watches-2.4.0][0]] ...]).
[2016-10-12 20:13:25,676][INFO ][tasks                    ] 566 finished with response ReindexResponse[took=38ms,updated=0,created=2,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,indexing_failures=[],search_failures=[]]
[2016-10-12 20:13:25,690][DEBUG][action.admin.cluster.node.tasks.list] [Gibbon] failed to execute on node [Rvrt5OjbTGi3bRtRt_rVBg]
RemoteTransportException[[Gibbon][127.0.0.1:9300][cluster:monitor/tasks/lists[n]]]; nested: ResourceNotFoundException[task [Rvrt5OjbTGi3bRtRt_rVBg:566] is missing];
Caused by: ResourceNotFoundException[task [Rvrt5OjbTGi3bRtRt_rVBg:566] is missing]
    at org.elasticsearch.action.support.tasks.TransportTasksAction.processTasks(TransportTasksAction.java:160)
    at org.elasticsearch.action.admin.cluster.node.tasks.list.TransportListTasksAction.processTasks(TransportListTasksAction.java:83)
    at org.elasticsearch.action.admin.cluster.node.tasks.list.TransportListTasksAction.processTasks(TransportListTasksAction.java:50)
    at org.elasticsearch.action.support.tasks.TransportTasksAction.nodeOperation(TransportTasksAction.java:113)
    at org.elasticsearch.action.support.tasks.TransportTasksAction.access$1000(TransportTasksAction.java:65)
    at org.elasticsearch.action.support.tasks.TransportTasksAction$NodeTransportHandler.messageReceived(TransportTasksAction.java:329)
    at org.elasticsearch.action.support.tasks.TransportTasksAction$NodeTransportHandler.messageReceived(TransportTasksAction.java:325)
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
    at org.elasticsearch.shield.transport.ShieldServerTransportService$ProfileSecuredRequestHandler.messageReceived(ShieldServerTransportService.java:188)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:77)
    at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
[2016-10-12 20:13:25,758][INFO ][cluster.metadata         ] [Gibbon] updating number_of_replicas to [0] for indices [.watches-2.4.0]
[2016-10-12 20:13:25,761][INFO ][index.shard              ] [Gibbon] [.watches-2.4.0][0] updating refresh_interval from [-1] to [1s]

Per the log file above, the underlying reindexing job appears to have succeeded, and in fact, it reindexed the 2 documents and generated the new index.

green  open   .watches-2.4.0                    1   0          2            0      4.3kb          4.3kb 
green  open   .watches                          1   0          2            0      4.2kb          4.2kb 

But due to the failure above, it never completed all the steps in the reindex helper routine. Since the underlying reindex API reports a reindexresponse with the expected 2 documents created, I am filing this against the migration plugin.

clintongormley commented 7 years ago

Pushed a fix which catches the inability to delete the .watcher index correctly. Closing in favour of #79