elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.17k stars 24.84k forks source link

[CI] SingleNodeDiscoveryIT testCannotJoinNodeWithSingleNodeDiscovery failing #106425

Closed davidkyle closed 6 months ago

davidkyle commented 8 months ago

Build scan: https://gradle-enterprise.elastic.co/s/a5dvvhunctzui/tests/:server:internalClusterTest/org.elasticsearch.discovery.single.SingleNodeDiscoveryIT/testCannotJoinNodeWithSingleNodeDiscovery

Reproduction line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.discovery.single.SingleNodeDiscoveryIT.testCannotJoinNodeWithSingleNodeDiscovery" -Dtests.seed=EDFBA5FF9920092C -Dtests.locale=ga-IE -Dtests.timezone=Pacific/Saipan -Druntime.java=21

Applicable branches: main

Reproduces locally?: Didn't try

Failure history: Failure dashboard for org.elasticsearch.discovery.single.SingleNodeDiscoveryIT#testCannotJoinNodeWithSingleNodeDiscovery&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testCannotJoinNodeWithSingleNodeDiscovery'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.discovery.single.SingleNodeDiscoveryIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.lang.AssertionError: 
Expected: (an empty collection or iterable containing [a string starting with "JNDI lookup class is not available because this JRE does not support JNDI. JNDI string lookups will not be available, continuing configuration.", a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "] or iterable containing [a string starting with "JMX runtime input lookup class is not available because this JRE does not support JMX. JMX lookups will not be available, continuing configuration. "])
     but: was <[Attempted to append to non-started appender mock]>

  at __randomizedtesting.SeedInfo.seed([EDFBA5FF9920092C:120F4759CD49A567]:0)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
  at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
  at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2147)
  at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:718)
  at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:524)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 8 months ago

Pinging @elastic/es-distributed (Team:Distributed)

pxsalehi commented 7 months ago

This seems like a test issue which may or may not be related to https://github.com/elastic/elasticsearch/pull/102516. But I think this is for delivery team maybe. Feel free to relabel if that's not the case.

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-delivery (Team:Delivery)

mark-vieira commented 7 months ago

I don't have much insight into the internal test cluster fixtures. This is an area with fuzzy ownership to be honest but I think core/infra is the most appropriate.

elasticsearchmachine commented 7 months ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst commented 7 months ago

I investigated this a bit, but could not come to any conclusions. The assertion is cryptic, but it results from a MockLogAppender not being started when a message is appended to it. Yet that check happens within log4j (AppenderControl), and goes to the status logger. So we don't have any control or ability to fail harder when this happens to find where it is actually occurring.

My only thought for a potential fix is to always start MockLogAppender instances within the constructor. I don't see a downside to this, but perhaps I'm missing something.

DaveCTurner commented 7 months ago

I think it might just not be threadsafe to remove appenders from loggers and stop the appender while other threads might be logging stuff. The following test pretty reliably fails for me:

public void testConcurrentLogAndLifecycle() throws Exception {
    final var keepGoing = new AtomicBoolean(true);
    final var logThread = new Thread(() -> {
        while (keepGoing.get()) {
            logger.info("test");
        }
    });
    logThread.start();

    final var appender = new MockLogAppender();
    for (int i = 0; i < 1000; i++) {
        try (var ignored = appender.capturing(MockLogAppenderTests.class)) {
            Thread.yield();
        }
    }

    keepGoing.set(false);
    logThread.join();
}
rjernst commented 6 months ago

Thanks @DaveCTurner, you are right, it appears modifying appenders is simply not threadsafe. I have an idea which seems to work: adding a single appender during test static init (much like some other appenders we add in ESTestCase). That single appender internally keeps track of mapping between loggers and expections in a threadsafe way. Witht hat change I was able to get your example test above to pass consistently over many runs.

However, there are a lot of places adding the MockLogAppender to appenders directly which needs to be addressed first. I've opened https://github.com/elastic/elasticsearch/pull/108114 to do that.