elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
889 stars 24.81k forks source link

Tests: RareClusterStateIT. testUnassignedShardAndEmptyNodesInRoutingTable failed #21463

Closed spinscale closed 6 years ago

spinscale commented 7 years ago

Elasticsearch version: 5.x branch

Details at https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=fedora/214/consoleFull

Two exceptions are popping up here (compared to my local run, where dont get any exception in the logs), one is a ClassCastException for trying to cast LocalTransportAddress to InetSocketTransportAddress (as the test does not mock zenpings), but also there is a NoSuchFileException for an index, when cleaning up.

@jpountz or @bleskes can you take a look maybe?

bleskes commented 7 years ago

I briefly looked at the NoSuchFileException I suspect it is an concurrency issue between reading and writing state files. This was previously raised by @ywelsch but we have yet to solve it.

jaymode commented 7 years ago

This test failed again today on master but with a different issue related to failing to delete a index file:

ERROR   33.5s J0 | RareClusterStateIT.testUnassignedShardAndEmptyNodesInRoutingTable <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: Delete Index failed - not acked
   > Expected: <true>
   >      but: was <false>
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:131)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:127)
   >    at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:140)
   >    at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:77)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:575)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2036)
   >    at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:547)
   >    at java.base/java.lang.Thread.run(Thread.java:844)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=8467, name=elasticsearch[node_t0][clusterService#updateTask][T#1], state=RUNNABLE, group=TGRP-RareClusterStateIT]
   > Caused by: java.lang.AssertionError
   >    at __randomizedtesting.SeedInfo.seed([3A6C7188C00DBD]:0)
   >    at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:477)
   >    at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:683)
   >    at org.elasticsearch.index.IndexService.onShardClose(IndexService.java:448)
   >    at org.elasticsearch.index.IndexService.access$100(IndexService.java:93)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.handle(IndexService.java:530)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.handle(IndexService.java:515)
   >    at org.elasticsearch.index.store.Store.closeInternal(Store.java:382)
   >    at org.elasticsearch.index.store.Store.access$000(Store.java:129)
   >    at org.elasticsearch.index.store.Store$1.closeInternal(Store.java:150)
   >    at org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:65)
   >    at org.elasticsearch.index.store.Store.decRef(Store.java:364)
   >    at org.elasticsearch.index.store.Store.close(Store.java:372)
   >    at org.elasticsearch.index.IndexService.closeShard(IndexService.java:428)
   >    at org.elasticsearch.index.IndexService.removeShard(IndexService.java:399)
   >    at org.elasticsearch.index.IndexService.close(IndexService.java:254)
   >    at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:542)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:263)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:197)
   >    at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:861)
   >    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:815)
   >    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:633)
   >    at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:1117)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1161)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   >    at java.base/java.lang.Thread.run(Thread.java:844)

This did not reproduce for me but the line is:

gradle :core:integTest -Dtests.seed=3A6C7188C00DBD -Dtests.class=org.elasticsearch.indices.state.RareClusterStateIT -Dtests.method="testUnassignedShardAndEmptyNodesInRoutingTable" -Dtests.security.manager=true -Dtests.jvm.argline="--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.nio.file=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED" -Dtests.locale=pt-GW -Dtests.timezone=Asia/Ulan_Bator

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/2048/consoleText

bleskes commented 7 years ago

@ywelsch can you take a look please?

ywelsch commented 7 years ago

@jaymode It's the same issue (exposed in the stack trace):

Caused by: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([3A6C7188C00DBD]:0) at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:477)

19338 suggests a fix.

javanna commented 7 years ago

I believe UpdateNumberOfReplicasIT.testAutoExpandNumberReplicas1ToData failed here for the same reason: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=debian/668 .

nik9000 commented 7 years ago

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.3+dockeralpine-periodic/479/consoleFull

This is the suite result: https://gist.github.com/nik9000/fd3adac7f9fc62e8fb26876e135418da

javanna commented 7 years ago

another one here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/2629 .

jimczi commented 7 years ago

another one here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/2629/console

abeyad commented 7 years ago

another one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/2583/console

spinscale commented 7 years ago

another one on 5.6 with java 9

gradle :core:integTest -Dtests.seed=5CA06AE61C782F90 -Dtests.class=org.elasticsearch.indices.state.RareClusterStateIT -Dtests.method="testUnassignedShardAndEmptyNodesInRoutingTable" -Dtests.security.manager=true -Dtests.jvm.argline="--add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.nio.file=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED" -Dtests.locale=sbp-TZ -Dtests.timezone=SystemV/EST5

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+java9-periodic/99/consoleFull

andyb-elastic commented 7 years ago

Another one on 6.0 https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.0+multijob-unix-compatibility/os=debian/60/consoleText

rjernst commented 7 years ago

Another instance: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.5+multijob-unix-compatibility/os=debian/84/console

tlrx commented 6 years ago

Looks like we are cumulating these errors:

java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([74C174345C2EC14F]:0) at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:460) at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:695)

See consoleText.txt

martijnvg commented 6 years ago

Another instance of this:

Failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/1084/consoleText

ERROR   43.5s J2 | RareClusterStateIT.testUnassignedShardAndEmptyNodesInRoutingTable <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: Delete Index failed - not acked
   > Expected: <true>
   >      but: was <false>
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:134)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:130)
   >    at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:142)
   >    at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:79)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:578)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2075)
   >    at java.lang.Thread.run(Thread.java:745)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1538, name=elasticsearch[node_t0][clusterApplierService#updateTask][T#1], state=RUNNABLE, group=TGRP-RareClusterStateIT]
   > Caused by: java.lang.AssertionError
   >    at __randomizedtesting.SeedInfo.seed([429E55B62DC1E952]:0)
   >    at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:453)
   >    at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:695)
   >    at org.elasticsearch.index.IndexService.onShardClose(IndexService.java:464)
   >    at org.elasticsearch.index.IndexService.access$100(IndexService.java:98)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:543)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:530)
   >    at org.elasticsearch.index.store.Store.closeInternal(Store.java:448)
   >    at org.elasticsearch.index.store.Store.access$000(Store.java:130)
   >    at org.elasticsearch.index.store.Store$1.closeInternal(Store.java:151)
   >    at org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:65)
   >    at org.elasticsearch.index.store.Store.decRef(Store.java:430)
   >    at org.elasticsearch.index.store.Store.close(Store.java:438)
   >    at org.elasticsearch.index.IndexService.closeShard(IndexService.java:445)
   >    at org.elasticsearch.index.IndexService.removeShard(IndexService.java:415)
   >    at org.elasticsearch.index.IndexService.close(IndexService.java:275)
   >    at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:554)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:285)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:219)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
   >    at java.lang.Iterable.forEach(Iterable.java:75)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >    at java.lang.Thread.run(Thread.java:745)

See consoleText3.txt

jasontedor commented 6 years ago

@bleskes This build failure is open for over a year. Can you please see that it is addressed?

tlrx commented 6 years ago

Another instance: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=ubuntu/560

consoleText.txt

dnhatn commented 6 years ago

Another instance on 6.x https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=opensuse/598/consoleText

Log: testUnassignedShardAndEmptyNodesInRoutingTable.txt

cbuescher commented 6 years ago

Another one here: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+oracle-java10-periodic/79/console

martijnvg commented 6 years ago

Another instance of this failure:

ERROR   43.4s J0 | RareClusterStateIT.testUnassignedShardAndEmptyNodesInRoutingTable <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: Delete Index failed - not acked
   > Expected: <true>
   >      but: was <false>
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:134)
   >    at org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked(ElasticsearchAssertions.java:130)
   >    at org.elasticsearch.test.TestCluster.wipeIndices(TestCluster.java:141)
   >    at org.elasticsearch.test.TestCluster.wipe(TestCluster.java:78)
   >    at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:579)
   >    at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2086)
   >    at java.lang.Thread.run(Thread.java:748)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1595, name=elasticsearch[node_t0][clusterApplierService#updateTask][T#1], state=RUNNABLE, group=TGRP-RareClusterStateIT]
   > Caused by: java.lang.AssertionError: Paths exist that should have been deleted: [/private/var/lib/jenkins/workspace/elastic+elasticsearch+6.x+multijob-darwin-compatibility/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_5AC245D3140EDCC0-001/tempDir-004/data/nodes/0/indices/dCLhOOkKS9qeFqcGEXM_-w/0]
   >    at __randomizedtesting.SeedInfo.seed([5AC245D3140EDCC0]:0)
   >    at org.elasticsearch.env.NodeEnvironment.assertPathsDoNotExist(NodeEnvironment.java:470)
   >    at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:460)
   >    at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:696)
   >    at org.elasticsearch.index.IndexService.onShardClose(IndexService.java:463)
   >    at org.elasticsearch.index.IndexService.access$100(IndexService.java:97)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:542)
   >    at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:529)
   >    at org.elasticsearch.index.store.Store.closeInternal(Store.java:440)
   >    at org.elasticsearch.index.store.Store.access$000(Store.java:130)
   >    at org.elasticsearch.index.store.Store$1.closeInternal(Store.java:151)
   >    at org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:65)
   >    at org.elasticsearch.index.store.Store.decRef(Store.java:422)
   >    at org.elasticsearch.index.store.Store.close(Store.java:430)
   >    at org.elasticsearch.index.IndexService.closeShard(IndexService.java:444)
   >    at org.elasticsearch.index.IndexService.removeShard(IndexService.java:414)
   >    at org.elasticsearch.index.IndexService.close(IndexService.java:274)
   >    at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:555)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:285)
   >    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:219)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$6(ClusterApplierService.java:498)
   >    at java.lang.Iterable.forEach(Iterable.java:75)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:495)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:482)
   >    at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
   >    at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:161)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:566)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:244)
   >    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:207)
   >    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   >    at java.lang.Thread.run(Thread.java:748)

consoleText.txt

Build url: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-darwin-compatibility/673/consoleText

spinscale commented 6 years ago

another one at https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-unix-compatibility/os=fedora/810/consoleText

not reproducible locally

full stack trace ``` 2> مار 13, 2018 12:38:26 م com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException 2> WARNING: Uncaught exception in thread: Thread[elasticsearch[node_t0][clusterService#updateTask][T#1],5,TGRP-RareClusterStateIT] 2> java.lang.AssertionError 2> at __randomizedtesting.SeedInfo.seed([A9281E7681DBDBA8]:0) 2> at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:516) 2> at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:680) 2> at org.elasticsearch.index.IndexService.onShardClose(IndexService.java:439) 2> at org.elasticsearch.index.IndexService.access$100(IndexService.java:92) 2> at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:521) 2> at org.elasticsearch.index.IndexService$StoreCloseListener.accept(IndexService.java:506) 2> at org.elasticsearch.index.store.Store.closeInternal(Store.java:397) 2> at org.elasticsearch.index.store.Store.access$000(Store.java:126) 2> at org.elasticsearch.index.store.Store$1.closeInternal(Store.java:147) 2> at org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:64) 2> at org.elasticsearch.index.store.Store.decRef(Store.java:379) 2> at org.elasticsearch.index.store.Store.close(Store.java:387) 2> at org.elasticsearch.index.IndexService.closeShard(IndexService.java:419) 2> at org.elasticsearch.index.IndexService.removeShard(IndexService.java:390) 2> at org.elasticsearch.index.IndexService.close(IndexService.java:246) 2> at org.elasticsearch.indices.IndicesService.removeIndex(IndicesService.java:539) 2> at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:258) 2> at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:192) 2> at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) 2> at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) 2> at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) 2> at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) 2> at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) 2> at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) 2> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:575) 2> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) 2> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) 2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2> at java.lang.Thread.run(Thread.java:748) ```
jasontedor commented 6 years ago

@bleskes Can you help find a path forward on getting this build failure resolved? I know that #19338 was proposed previously; can we take another look?

DaveCTurner commented 6 years ago

Another one here:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA=java9,ESRUNTIME=java10,nodes=linux/26/consoleText

   > Caused by: java.lang.AssertionError: Paths exist that should have been deleted: [/var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA/java9/ESRUNTIME/java10/nodes/linux/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_C9901BC8F496F938-001/tempDir-004/d1/nodes/0/indices/rFA6bNDHQT-cB8eWUA1MPw/0, /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA/java9/ESRUNTIME/java10/nodes/linux/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_C9901BC8F496F938-001/tempDir-004/d3/nodes/0/indices/rFA6bNDHQT-cB8eWUA1MPw/0, /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA/java9/ESRUNTIME/java10/nodes/linux/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_C9901BC8F496F938-001/tempDir-004/d0/nodes/0/indices/rFA6bNDHQT-cB8eWUA1MPw/0, /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA/java9/ESRUNTIME/java10/nodes/linux/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_C9901BC8F496F938-001/tempDir-004/d2/nodes/0/indices/rFA6bNDHQT-cB8eWUA1MPw/0]
   >    at __randomizedtesting.SeedInfo.seed([C9901BC8F496F938]:0)
   >    at org.elasticsearch.env.NodeEnvironment.assertPathsDoNotExist(NodeEnvironment.java:462)
   >    at org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:452)
   >    at org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:696)
   >    at org.elasticsearch.index.IndexService.onShardClose(IndexService.java:463)
...
  1> java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+master+multijob-java-periodic/ESJAVA/java9/ESRUNTIME/java10/nodes/linux/server/build/testrun/integTest/J0/temp/org.elasticsearch.indices.state.RareClusterStateIT_C9901BC8F496F938-001/tempDir-004/d0/nodes/0/indices/rFA6bNDHQT-cB8eWUA1MPw/0/_state/state-0.st
  1>    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
  1>    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
  1>    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
  1>    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:215) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240) ~[lucene-test-framework-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:44]
  1>    at java.nio.file.Files.newByteChannel(Files.java:369) ~[?:?]
  1>    at java.nio.file.Files.newByteChannel(Files.java:415) ~[?:?]
  1>    at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
  1>    at org.elasticsearch.gateway.MetaDataStateFormat.read(MetaDataStateFormat.java:179) ~[main/:?]
  1>    at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:319) [main/:?]
  1>    at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:119) [main/:?]
  1>    at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:61) [main/:?]
  1>    at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) [main/:?]
  1>    at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:260) [main/:?]
  1>    at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:256) [main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) [main/:?]
  1>    at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:656) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:672) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  1>    at java.lang.Thread.run(Thread.java:844) [?:?]
...
DaveCTurner commented 6 years ago

Note to future commenters: please seek out a stack trace for the code that resurrected the directory, rather than just the assertion failure. One may expect it to involve a call to MetaDataStateFormat.loadLatestState which should help with the search. The few examples that I can find only implicate TransportNodesListGatewayStartedShards, but we want to know if there are any other ways to get into this state.

bleskes commented 6 years ago

I took a look at testUnassignedShardAndEmptyNodesInRoutingTable and that test is as old as time and does a very bogus thing - it is an IT test which extracts the GatewayAllocator from the node and tells it to allocated unassigned shards, while giving it a conjured cluster state with no nodes in it (it uses the DiscoveryNodes.EMPTY_NODES. This is never a cluster state we want to reroute on (we always have at least master node in it). I'm going to just delete the test as I don't think it adds much value.

Obviously there is a problem here but I feel this is better tracked by #29140 where we'll add a targeted test.