elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.99k stars 24.76k forks source link

ShrinkIndexIT.testCreateShrinkIndexToN fails on Windows #33857

Closed alpar-t closed 5 years ago

alpar-t commented 6 years ago

5.6

FAILURE 33.4s J0 | ShrinkIndexIT.testCreateShrinkIndexToN <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: timed out waiting for green state
   >    at __randomizedtesting.SeedInfo.seed([FAAB0749B4C53E6F:EAEB4985116575FC]:0)
   >    at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:933)
   >    at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:899)
   >    at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:888)
   >    at org.elasticsearch.action.admin.indices.create.ShrinkIndexIT.testCreateShrinkIndexToN(ShrinkIndexIT.java:115)
   >    at java.lang.Thread.run(Thread.java:748)

and

 Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\recovery.AWXo49VofFhOtlDMUmb5._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1>    at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  1>    at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]
  1>    ... 5 more

[...]

1> java.io.IOException: Could not remove the following files (in the order of attempts):
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_0.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index\_1.cfs
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0\index
  2> REPRODUCE WITH: gradlew :core:integTest -Dtests.seed=FAAB0749B4C53E6F -Dtests.class=org.elasticsearch.action.admin.indices.create.ShrinkIndexIT -Dtests.method="testCreateShrinkIndexToN" -Dtests.security.manager=true -Dtests.locale=zh -Dtests.timezone=Etc/GMT+10
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg\0
  1>    C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg: java.nio.file.DirectoryNotEmptyException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_FAAB0749B4C53E6F-001\tempDir-002\data\nodes\2\indices\dT7bmk_uQtqKf2wjfSn3Lg
  1> 
  1>    at org.apache.lucene.util.IOUtils.rm(IOUtils.java:329) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  1>    at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectoryUnderLock(NodeEnvironment.java:558) ~[main/:?]
  1>    at org.elasticsearch.env.NodeEnvironment.deleteIndexDirectorySafe(NodeEnvironment.java:542) ~[main/:?]
  1>    at org.elasticsearch.indices.IndicesService.deleteIndexStoreIfDeletionAllowed(IndicesService.java:652) [main/:?]
  1>    at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:639) [main/:?]
  1>    at org.elasticsearch.indices.IndicesService.deleteIndexStore(IndicesService.java:634) [main/:?]
  1>    at org.elasticsearch.indices.IndicesService.deleteUnassignedIndex(IndicesService.java:602) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.deleteIndices(IndicesClusterStateService.java:263) [main/:?]
  1>    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:192) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [main/:?]
  1>    at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [main/:?]
  1>    at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
  1>    at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:576) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

It also happened on 6.4, 6.x and master

I was not able to reproduce it on a Windows 2012 VM. There's no evidence of similar failures in the past, nor recent changes to the 5.6 that are obviously related.

elasticmachine commented 6 years ago

Pinging @elastic/es-core-infra

s1monw commented 6 years ago

@jasontedor @bleskes any ideas here? I looked but it's really puzzeling

elasticmachine commented 6 years ago

Pinging @elastic/es-distributed

ywelsch commented 6 years ago

Relates #30962

tvernum commented 6 years ago

It's failed again.

I'm going to mute it on Windows unless someone says they need to capture more failure logs.

ywelsch commented 6 years ago

I've reenabled the test on master (db327818dd0) and added DEBUG logging for org.elasticsearch.index.store which will possibly tell us whether the move failed because a previous delete in renameTempFilesSafe (where we catch exception and log as debug) failed. For the person on test triage: If this test fails again on CI, please link the latest failure here.

romseygeek commented 6 years ago

Another failure today: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/14/console

ywelsch commented 6 years ago

Unfortunately that was on 5.6 (which does not have the extended logging, and which we forgot to mute). I've muted the test on 5.6 for now (a6aa7738d59).

romseygeek commented 6 years ago

Now there's a master failure :)

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/14/console

romseygeek commented 6 years ago

PartitionedRoutingIT.testShrinking seems to have a similar issue: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/16/console

mayya-sharipova commented 6 years ago

Similar failures on 5.6, 6.x and master in the last 12 hours.

gradlew :core:integTest \
  -Dtests.seed=41732671CC43CCD8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=ar-MA \
  -Dtests.timezone=Europe/Tirane

seems to be caused by AccessDeniedException

Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\recovery.XyXJa1iLT_eQAR39YSXIKg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J3\temp\org.elasticsearch.routing.PartitionedRoutingIT_22FF5B1F0934DA81-001\tempDir-002\data\nodes\0\indices\nCj_PB_1SvemMgYmkmpqNw\1\index\_0.cfs
cbuescher commented 6 years ago

fyi. recent master change built was: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/19/console Trying to get to the machine and take a look at the logs now.

cbuescher commented 6 years ago

Another recent one: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/18/console

droberts195 commented 6 years ago

I think this also affects SplitIndexIT.testSplitFromOneToN.

These two build logs contain the extra debugging info:

DaveCTurner commented 6 years ago

I spent some time trying to reproduce this by hand on one of the Windows CI workers. Although I never saw the test fail, I did see tests reporting the failures of recoveries for the reason quoted in the OP:

  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [second_shrink][0]: Recovery failed from {node_sd2}{TO36W-4uR0GLz8RwgnmWXg}{vtq1HlXqTy-5ut_hiS2-TQ}{127.0.0.1}{127.0.0.1:52206} into {node_sd1}{VwtnN_EuS8aKva_oys857Q}{OCu18ksZRdKzsOChRaIFkw}{127.0.0.1}{127.0.0.1:52177}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:283) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:81) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:628) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  1>    at java.lang.Thread.run(Thread.java:844) [?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:52206][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [7] files with total size of [8kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:422) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:99) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:51) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:108) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:105) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:52177][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\recovery.uKRsdtvIQbmCSaJo9IlMzA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\server\build\testrun\integTest\J0\temp\org.elasticsearch.action.admin.indices.create.ShrinkIndexIT_3825ACCB7B6BF7DB-001\tempDir-002\data\nodes\1\indices\4EzGadc6Rpa77FYUbJVhvA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:298) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:288) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:40]
  1>    at java.nio.file.Files.move(Files.java:1413) ~[?:?]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-8.0.0-snapshot-7d0a7782fa.jar:8.0.0-snapshot-7d0a7782fa 7d0a7782fa7e99250bccfb4d3e995485c3f0ca19 - jimczi - 2018-09-18 14:02:34]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:335) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:444) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:561) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:555) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) ~[?:?]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
  1>    at java.lang.Thread.run(Thread.java:844) ~[?:?]

Here is a copy of the console output, including repeated test runs (of which the stack trace above is the last). This tarball also includes a Process Monitor log from this run, filtered to just the relevant paths: 33857.tar.gz

Interesting things to note from the process monitor log are:

  1. There's no other process involved here. I'm pretty sure that if this were Windows Defender then it'd also appear in this log.
  2. There seem to be repeated attempts to remove _0.cfs which result in CANNOT DELETE
  3. Finally we can see the rename results in ACCESS DENIED.

I do not yet have a good idea for a next step on this.

DaveCTurner commented 6 years ago

On repeated runs I do see occasional failures as per the OP (maybe 20% of the time, I've not done the stats in detail yet). I tried writing a test case that repeatedly hits the filesystem directly and could not reproduce this AccessDeniedException, so decided to turn my attention to all the Lucene-provided filesystem machinery that's in play here to see if I could rule that out as a source. The filesystem provider in use in a failing test is

HandleLimitFS(LeakFS(ShuffleFS(DisableFsyncFS(sun.nio.fs.WindowsFileSystemProvider@2d1a5acc))))

I was suspicious about that DisableFsyncFS in there (the rest look benign) so I added @SuppressFileSystems(value = "DisableFsyncFS") to the ShrinkIndexIT fixture, but still got failures. I then tried @SuppressFileSystems(value = "*") and still it fails - the filesystem provider in this case is confirmed to be a plain sun.nio.fs.WindowsFileSystemProvider. The story continues...

danielmitterdorfer commented 6 years ago

We have more test failures on Windows with the same cause:

5.6: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/27/console:

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

[2018-09-27T18:57:01,933][WARN ][o.e.c.a.s.ShardStateAction] [node_s0] [index_2][0] received shard failed for shard id [[index_2][0]], allocation id [0Jyeijx5QoeBLzJ85Ym4uw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[index_2][0]: Recovery failed from {node_s1}{UFRWgSweS6uUDivYiTC1-w}{4v-Ak5QCQ62QpeiZyzwK8w}{local}{local[368]} into {node_s0}{Povm-pMfSAaqPh8OcDkjHg}{3UCCMKflQTOz9cF4-uejCA}{local}{local[366]}]; nested: RemoteTransportException[[node_s0][local[366]][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [25] files with total size of [36.4kb]]; nested: RemoteTransportException[[node_s1][local[368]][internal:index/shard/recovery/clean_files]]; nested: AccessDeniedException[C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs]; ]
[...]
Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\recovery.AWYcZDdo0i__cHMcxiHN._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+5.6+multijob-windows-compatibility\core\build\testrun\integTest\J0\temp\org.elasticsearch.routing.PartitionedRoutingIT_D736F1C7B3CED870-001\tempDir-002\data\nodes\0\indices\qvLxi615Saym8cpW_c-H4Q\0\index\_0.cfs
  at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:40]
  at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:297) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:229) ~[lucene-test-framework-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:88) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
  at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:319) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:181) ~[main/:?]
  at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:406) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:486) ~[main/:?]
  at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[main/:?]
  at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33) ~[main/:?]
  at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[main/:?]
  at org.elasticsearch.transport.local.LocalTransport$2.doRun(LocalTransport.java:390) ~[main/:?]

6.x: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/27/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][1]: Recovery failed from {node_s0}{7C-fb4R9RK-WS4Jk1WYXuQ}{BijPdN92Q32zPDZ9t_ldiA}{127.0.0.1}{127.0.0.1:55085} into {node_s1}{RqyKMgp3QZ-gu0IVK_U0AA}{I5djfftGTtmtr6bQbOAUJA}{127.0.0.1}{127.0.0.1:55086}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:624) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s0][127.0.0.1:55089][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [13] files with total size of [19.3kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:424) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:55090][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\recovery.fshzJJ1CQCelF3mD6b587Q._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.x+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_CAE88D19276CD2B2-001\tempDir-002\data\nodes\1\indices\H6TUqJpRTRedJr1z26jubw\1\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:16]
  1>    at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1>    at org.apache.lucene.store.MockDirectoryWrapper.rename(MockDirectoryWrapper.java:231) ~[lucene-test-framework-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:01:13]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:334) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:452) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:557) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:551) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1350) ~[main/:?]

6.4: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/28/console

org.elasticsearch.routing.PartitionedRoutingIT#testShrinking()

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [index_4][3]: Recovery failed from {node_sd1}{O8l8bMuJTv2FfmOwRxXWgg}{1m7utzhqQjWidDUawrikfQ}{127.0.0.1}{127.0.0.1:63944} into {node_sd2}{IBcohRX6RB2qHr4S6wDBUQ}{Fa3f-veRRFOWmrrADglJ9A}{127.0.0.1}{127.0.0.1:63945}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd1][127.0.0.1:63944][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [4] files with total size of [4.5kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:63945][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\recovery.CB9P345NRmmuu-tPzwjksg._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.routing.PartitionedRoutingIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\2\indices\SI6IoZJ5R_Wb4Nqx06oDmA\3\index\_0.cfs

org.elasticsearch.action.admin.indices.create.SplitIndexIT#testCreateSplitIndexToN():

Expand for stack trace

org.elasticsearch.indices.recovery.RecoveryFailedException: [second_split][0]: Recovery failed from {node_sd2}{hcRrVZwPQ7e8MTakDgz_GQ}{wfXHyRV6RPW0sKTXdamhdA}{127.0.0.1}{127.0.0.1:58177} into {node_sd3}{VlMAJBDSTemm3IJZgPciLA}{zPecWbBiSWaagZwl7Q1l6g}{127.0.0.1}{127.0.0.1:58171}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd2][127.0.0.1:58177][internal:index/shard/recovery/start_recovery]
  1> Caused by: org.elasticsearch.index.engine.RecoveryEngineException: Phase[1] phase1 failed
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:174) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [5] files with total size of [5.6kb]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.phase1(RecoverySourceHandler.java:446) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:172) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.recover(PeerRecoverySourceService.java:98) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService.access$000(PeerRecoverySourceService.java:50) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:107) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoverySourceService$StartRecoveryTransportRequestHandler.messageReceived(PeerRecoverySourceService.java:104) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]
  1>    ... 5 more
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [node_sd3][127.0.0.1:58171][internal:index/shard/recovery/clean_files]
  1> Caused by: java.nio.file.AccessDeniedException: C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\recovery._dztPVO1RBKeP7ZpRHIbiA._0.cfs -> C:\Users\jenkins\workspace\elastic+elasticsearch+6.4+multijob-windows-compatibility\server\build\testrun\integTest\J2\temp\org.elasticsearch.action.admin.indices.create.SplitIndexIT_1C4D221BAD62999F-001\tempDir-002\data\nodes\3\indices\xKk-zXSXQsGpeMmq5ysVpA\0\index\_0.cfs
  1>    at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83) ~[?:?]
  1>    at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
  1>    at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301) ~[?:?]
  1>    at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) ~[?:?]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at org.apache.lucene.mockfile.FilterFileSystemProvider.move(FilterFileSystemProvider.java:147) ~[lucene-test-framework-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:47]
  1>    at java.nio.file.Files.move(Files.java:1395) ~[?:1.8.0_181]
  1>    at org.apache.lucene.store.FSDirectory.rename(FSDirectory.java:303) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.apache.lucene.store.FilterDirectory.rename(FilterDirectory.java:89) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
  1>    at org.elasticsearch.index.store.Store.renameTempFilesSafe(Store.java:337) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.renameAllTempFiles(RecoveryTarget.java:188) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.RecoveryTarget.cleanFiles(RecoveryTarget.java:439) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:556) ~[main/:?]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$CleanFilesRequestHandler.messageReceived(PeerRecoveryTargetService.java:550) ~[main/:?]
  1>    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[main/:?]
  1>    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[main/:?]
  1>    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1605) ~[main/:?]  

DaveCTurner commented 6 years ago

Looking harder at the logs, what happens is:

However we failed to delete the unassigned shards from the first node after the rebalance (at least twice, maybe it succeeds, we're still looking). For some reason when we try and allocate a shard on top of the old directory it then fails with the AccessDeniedException.

DaveCTurner commented 6 years ago

Note that we're using MMapDirectory here. Setting index.store.type: simplefs yielded no failure in 10 iterations.

DaveCTurner commented 6 years ago

I have tried to find a way to reproduce this with fewer moving parts, but so far this has failed. The following test passes.

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */
package org.elasticsearch.action.admin.indices.create;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.IOContext;
import org.apache.lucene.store.IndexOutput;
import org.elasticsearch.Version;
import org.elasticsearch.cluster.metadata.IndexMetaData;
import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.shard.ShardId;
import org.elasticsearch.index.store.Store;
import org.elasticsearch.test.DummyShardLock;
import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.test.junit.annotations.TestLogging;
import org.elasticsearch.threadpool.TestThreadPool;
import org.elasticsearch.threadpool.ThreadPool;

import java.io.IOException;
import java.nio.file.Path;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;

import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_REPLICAS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.INDEX_NUMBER_OF_SHARDS_SETTING;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_INDEX_VERSION_CREATED;
import static org.elasticsearch.cluster.metadata.IndexMetaData.SETTING_NUMBER_OF_SHARDS;
import static org.elasticsearch.index.store.Store.INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING;

@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class WindowsFilesystemTests extends ESTestCase {
    public void testWindowsFilesystem() {
        ThreadPool threadPool = new TestThreadPool("test");
        try {
            final Path tempDir1 = createTempDir();
            final IOContext ioContext = new IOContext();
            final int attemptCount = 1000;
            final CountDownLatch countDownLatch = new CountDownLatch(attemptCount);

            try (Directory rawDirectory = FSDirectory.open(tempDir1)) {
                logger.info("rawDirectory = {}", rawDirectory);

                final ShardId shardId = new ShardId("foo", "bar", 0);
                final Store store = new Store(shardId,
                    new IndexSettings(IndexMetaData.builder(shardId.getIndexName())
                        .settings(Settings.builder()
                            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
                            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)
                            .put(SETTING_INDEX_VERSION_CREATED.getKey(), Version.CURRENT)
                            .put(INDEX_STORE_STATS_REFRESH_INTERVAL_SETTING.getKey(), TimeValue.ZERO)).build(), Settings.EMPTY),
                    rawDirectory, new DummyShardLock(shardId));

                final AtomicBoolean queryStatsThreadCancelled = new AtomicBoolean();
                final Thread queryStatsThread = new Thread(() -> {
                    while (queryStatsThreadCancelled.get() == false) {
                        try {
                            store.stats();
                        } catch (IOException e) {
                            throw new AssertionError("unexpected", e);
                        }
                    }
                });
                queryStatsThread.start();

                final Directory directory = store.directory();
                logger.info("directory = {}", directory);

                for (int i = 0; i < attemptCount; i++) {
                    for (int j = 1; j <= 2; j++) {
                        try (IndexOutput indexOutput = directory.createOutput("test-" + i + "-" + j + ".dat", ioContext)) {
                            int longCount = randomIntBetween(100, 10000);
                            for (int k = 0; k < longCount; k++) {
                                indexOutput.writeLong(randomLong());
                            }
                        }
                    }
                }

                for (int i = 0; i < attemptCount; i++) {
                    final String fileName1 = "test-" + i + "-1.dat";
                    final String fileName2 = "test-" + i + "-2.dat";

                    threadPool.generic().execute(() -> {
                        try {
                            directory.deleteFile(fileName1);
                            Thread.sleep(randomIntBetween(1, 50));
                        } catch (IOException | InterruptedException e) {
                            countDownLatch.countDown();
                            throw new AssertionError("unexpected", e);
                        }

                        threadPool.generic().execute(() -> {
                            try {
                                directory.rename(fileName2, fileName1);
                            } catch (IOException e) {
                                countDownLatch.countDown();
                                throw new AssertionError("unexpected", e);
                            }

                            threadPool.generic().execute(() -> {
                                try {
                                    directory.deleteFile(fileName1);
                                } catch (IOException e) {
                                    throw new AssertionError("unexpected", e);
                                } finally {
                                    countDownLatch.countDown();
                                }
                            });
                        });
                    });
                }

                countDownLatch.await();
                queryStatsThreadCancelled.set(true);
                queryStatsThread.join();
            }
        } catch (IOException | InterruptedException e) {
            throw new AssertionError("exception", e);
        } finally {
            threadPool.shutdown();
        }
    }
}
DaveCTurner commented 6 years ago

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

https://github.com/elastic/elasticsearch/blob/c4b831645cb0369449d092a14ecaff4c04528141/server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java#L726-L727

In other cases, I guess an allocation that fails like this will retry (up to 5 times) and subsequent attempts will succeed.


Correction: subsequent attempts to allocate a copy of the shrunk index onto the node containing the source index will, I think, all fail, because of https://github.com/elastic/elasticsearch/issues/33857#issuecomment-425438171 (unless either the source index is removed or closed, or else all the shared segments are merged away, neither of which seems very likely in this test).

DaveCTurner commented 6 years ago

To test this hypothesis, I tried writing a test that repeatedly allocates a shard to a node and then removes it, hoping to hit an allocation failure, with max retries set to 0:

@ClusterScope(scope = TEST, numDataNodes = 2)
@TestLogging("org.elasticsearch:TRACE,index.store.deletes:TRACE")
public class ReliableRecoveryIT extends ESIntegTestCase {
    public void testReliableRecovery() {
        prepareCreate("index").setSettings(Settings.builder().put(indexSettings())
            .put(INDEX_NUMBER_OF_SHARDS_SETTING.getKey(), 1)
            .put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)
            .put(SETTING_ALLOCATION_MAX_RETRY.getKey(), 0))
            .get();

        ensureGreen("index");

        for (int i = 0; i < 20; i++) {
            internalCluster().client().prepareBulk()
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .add(new IndexRequest("index", "_doc", null).source("{}", XContentType.JSON))
                .setRefreshPolicy(IMMEDIATE)
                .get();
        }

        for (int i = 0; i < 10; i++) {
            logger.info("--> removing replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 0)).get();
            logger.info("--> adding replica [{}]", i);
            internalCluster().client().admin().indices().prepareUpdateSettings("index").setSettings(
                Settings.builder().put(INDEX_NUMBER_OF_REPLICAS_SETTING.getKey(), 1)).get();
            logger.info("--> awaiting green [{}]", i);
            ensureGreen("index");
        }
    }
}

This passes. I'm mystified.

bleskes commented 6 years ago

One interesting side-question is why this happens more on tests to do with shrinking. I think this is the answer:

Good catch. We should clean these up IMO.

DaveCTurner commented 6 years ago

Ok, I've got it 🎉

The following test passes on Windows (i.e. throws an AccessDeniedException on the rename). It seems that if you create a hard link to a file, and then delete the link, but the original file is still open, then you can't rename something else into the place where the hard link used to be.

    public void testWindowsFilesystem() throws IOException {
        final Path directoryPath = createTempDir();
        final IOContext ioContext = new IOContext();

        try (Directory directory = FSDirectory.open(directoryPath)) {
            logger.info("directory = {}", directory);

            final String fileName1 = "test-1.dat";
            final String fileName2 = "test-2.dat";
            final String fileName3 = "test-3.dat";

            Files.write(directoryPath.resolve(fileName1), new byte[]{0}); // must be nonempty
            Files.write(directoryPath.resolve(fileName2), new byte[0]); // empty file

            try (IndexInput ignored = directory.openInput(fileName1, ioContext)) {
                Files.createLink(directoryPath.resolve(fileName3), directoryPath.resolve(fileName1));
                directory.deleteFile(fileName3);
                expectThrows(AccessDeniedException.class, () -> directory.rename(fileName2, fileName3));
            }
        }
    }

More precisely, attempts to delete fileName3 while fileName1 is still open will fail with AccessDeniedException, but we suppress these inside FSDirectory.

This only happens when using MMapDirectory and not SimpleFSDirectory.

mayya-sharipova commented 6 years ago

Still experiencing failure today on 6.4

Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.4+multijob-windows-compatibility/34/console

REPRODUCE WITH: gradlew :server:integTest \ -Dtests.seed=544A239B8224F7C5 \ -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \ -Dtests.method="testShrinking" \ -Dtests.security.manager=true \ -Dtests.locale=en-CA \ -Dtests.timezone=SystemV/YST9YDT

and 6.x Log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-windows-compatibility/35/console

07:44:40 FAILURE 33.3s J2 | PartitionedRoutingIT.testShrinking <<< FAILURES! 07:44:40 > Throwable #1: java.lang.AssertionError: timed out waiting for green state 07:44:40 > at __randomizedtesting.SeedInfo.seed([544A239B8224F7C5:AE69E6F0A5B2602E]:0) 07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureColor(ESIntegTestCase.java:967) 07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:923) 07:44:40 > at org.elasticsearch.test.ESIntegTestCase.ensureGreen(ESIntegTestCase.java:912) 07:44:40 > at org.elasticsearch.routing.PartitionedRoutingIT.testShrinking(PartitionedRoutingIT.java:96) 07:44:40 > at java.lang.Thread.run(Thread.java:748)

mayya-sharipova commented 6 years ago

I have muted PartitionedRoutingIT:: testShrinking test on 6.4 and 6.x on Windows

hub-cap commented 6 years ago

saw again in 5.6 via https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.6+multijob-windows-compatibility/41/consoleText

original-brownbear commented 6 years ago

master still failing similarly:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-windows-compatibility/42/console
REPRODUCE WITH: gradlew :server:integTest \
  -Dtests.seed=7379D1534EE3AFC8 \
  -Dtests.class=org.elasticsearch.routing.PartitionedRoutingIT \
  -Dtests.method="testShrinking" \
  -Dtests.security.manager=true \
  -Dtests.locale=pl-PL \
  -Dtests.timezone=US/Arizona \
  -Dcompiler.java=11 \
  -Druntime.java=8
DaveCTurner commented 6 years ago

We've worked out what's going on here so I've unassigned myself from this issue. Similar failures on Windows should be muted until we work out what to do with it - no need to report them here.

ywelsch commented 5 years ago

This is a Windows limitation with MMapDirectory we cannot do anything about -> Closing