elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

FrozenIndexShardTests.testRecoverFromFrozenPrimary fails with new Lucene snapshot #110898

Closed benwtrent closed 1 week ago

benwtrent commented 3 months ago

CI Link

https://gradle-enterprise.elastic.co/s/b77e65ulskhjg

Repro line

./gradlew ":x-pack:plugin:frozen-indices:test" --tests "org.elasticsearch.index.engine.frozen.FrozenIndexShardTests.testRecoverFromFrozenPrimary" -Dtests.seed=2B3BFA46A5FBD920 -Dtests.locale=es-BO -Dtests.timezone=America/North_Dakota/New_Salem -Druntime.java=22

Does it reproduce?

Yes

Applicable branches

lucene_snapshot

Failure history

No response

Failure excerpt

java.lang.AssertionError: org.elasticsearch.indices.recovery.RecoveryFailedException: [index][0]: Recovery failed from {ZGkPqnuVVU}{ZGkPqnuVVU}{KiQWN1HJSqa90E12uMaXeQ}{ZGkPqnuVVU}{0.0.0.0}{0.0.0.0:3}{IScdfhilmrstvw}{8.16.0}{7000099-8600000} into {bqiDOJnrrE}{bqiDOJnrrE}{7RUGdJ3vTLKGlr2T5ziOuA}{bqiDOJnrrE}{0.0.0.0}{0.0.0.0:4}{IScdfhilmrstvw}{8.16.0}{7000099-8600000}
        at __randomizedtesting.SeedInfo.seed([2B3BFA46A5FBD920:450D0E6510631AC8]:0)
        at org.elasticsearch.index.shard.IndexShardTestCase$2.onRecoveryFailure(IndexShardTestCase.java:149)
        at org.elasticsearch.indices.recovery.RecoveryTarget.notifyListener(RecoveryTarget.java:316)
        at org.elasticsearch.indices.recovery.RecoveryTarget.fail(RecoveryTarget.java:303)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverUnstartedReplica(IndexShardTestCase.java:883)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverReplica(IndexShardTestCase.java:812)
        at org.elasticsearch.index.shard.IndexShardTestCase.recoverReplica(IndexShardTestCase.java:788)
        at org.elasticsearch.index.engine.frozen.FrozenIndexShardTests.testRecoverFromFrozenPrimary(FrozenIndexShardTests.java:46)

        Caused by:
        org.elasticsearch.indices.recovery.RecoveryFailedException: [index][0]: Recovery failed from {ZGkPqnuVVU}{ZGkPqnuVVU}{KiQWN1HJSqa90E12uMaXeQ}{ZGkPqnuVVU}{0.0.0.0}{0.0.0.0:3}{IScdfhilmrstvw}{8.16.0}{7000099-8600000} into {bqiDOJnrrE}{bqiDOJnrrE}{7RUGdJ3vTLKGlr2T5ziOuA}{bqiDOJnrrE}{0.0.0.0}{0.0.0.0:4}{IScdfhilmrstvw}{8.16.0}{7000099-8600000}
            at app//org.elasticsearch.index.shard.IndexShardTestCase.recoverUnstartedReplica(IndexShardTestCase.java:883)
            ... 3 more

            Caused by:
            java.lang.WrongThreadException: Attempted access outside owning thread
                at java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:314)
                at java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)
                at java.base/jdk.internal.foreign.MemorySessionImpl.checkValidState(MemorySessionImpl.java:209)
                at java.base/jdk.internal.foreign.ConfinedSession.justClose(ConfinedSession.java:82)
                at java.base/jdk.internal.foreign.MemorySessionImpl.close(MemorySessionImpl.java:232)
                at java.base/jdk.internal.foreign.ArenaImpl.close(ArenaImpl.java:50)
                at org.apache.lucene.store.MemorySegmentIndexInput.close(MemorySegmentIndexInput.java:514)
                at org.apache.lucene.tests.store.MockIndexInputWrapper.close(MockIndexInputWrapper.java:81)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:87)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:63)
                at org.elasticsearch.indices.recovery.RecoverySourceHandler$2.close(RecoverySourceHandler.java:1426)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:71)
                at org.elasticsearch.core.IOUtils.close(IOUtils.java:87)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:144)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:113)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:72)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:97)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:85)
                at org.elasticsearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:73)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:83)
                at org.elasticsearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$4(MultiChunkTransfer.java:120)
                at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:249)
                at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:307)
                at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:392)
                at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:307)
                at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:392)
                at org.elasticsearch.indices.recovery.RecoveryTarget.writeFileChunk(RecoveryTarget.java:583)
                at org.elasticsearch.indices.recovery.AsyncRecoveryTarget.lambda$writeFileChunk$6(AsyncRecoveryTarget.java:118)
                at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)
                at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
                at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
                at java.base/java.lang.Thread.run(Thread.java:1570)
elasticsearchmachine commented 3 months ago

Pinging @elastic/es-distributed (Team:Distributed)

benwtrent commented 3 months ago

I wanted to mark as blocker, but this is only in the lucene snapshot branch. My concern is that since that branch is long lived, we will forget about this :/

benwtrent commented 3 months ago

Actually, this "access from the wrong thread" is causing other tests to fail as well.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search (Team:Search)

arteam commented 3 months ago

I've looked at the the test and it doesn't seem that we do anything non-standard in that test, RecoverySourceHandler just closes opened resources, including MemorySegmentIndexInput. I feel that the issue on Lucene's side in the implementation of MemorySegmentIndexInput and how it uses the foreign-memaccess API.

elasticsearchmachine commented 1 week ago

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.