elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

[CI] SearchServiceTests testWaitOnRefreshFailsIfCheckpointNotIndexed failing #102754

Closed luigidellaquila closed 7 months ago

luigidellaquila commented 11 months ago

Build scan: https://gradle-enterprise.elastic.co/s/tc2hypko6u4fm/tests/:server:test/org.elasticsearch.search.SearchServiceTests/testWaitOnRefreshFailsIfCheckpointNotIndexed

Reproduction line:

./gradlew ':server:test' --tests "org.elasticsearch.search.SearchServiceTests.testWaitOnRefreshFailsIfCheckpointNotIndexed" -Dtests.seed=9ABEAB21235CF993 -Dtests.locale=sv-SE -Dtests.timezone=Canada/Saskatchewan -Druntime.java=21

Applicable branches: main

Reproduces locally?: No

Failure history: https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testWaitOnRefreshFailsIfCheckpointNotIndexed'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.search.SearchServiceTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))

Failure excerpt:

junit.framework.AssertionFailedError: Unexpected exception type, expected IllegalArgumentException but got org.elasticsearch.ElasticsearchTimeoutException: Wait for seq_no [1] refreshed timed out [20ms]

  at __randomizedtesting.SeedInfo.seed([9ABEAB21235CF993:F3FA3ADA896F76DC]:0)
  at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2869)
  at org.apache.lucene.tests.util.LuceneTestCase.expectThrows(LuceneTestCase.java:2850)
  at org.elasticsearch.search.SearchServiceTests.testWaitOnRefreshFailsIfCheckpointNotIndexed(SearchServiceTests.java:1786)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

  Caused by: org.elasticsearch.ElasticsearchTimeoutException: Wait for seq_no [1] refreshed timed out [20ms]

    at org.elasticsearch.search.SearchService.lambda$ensureAfterSeqNoRefreshed$4(SearchService.java:572)
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)
    at org.elasticsearch.threadpool.ThreadPool$1.run(ThreadPool.java:481)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
    at java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.lang.Thread.run(Thread.java:1583)
elasticsearchmachine commented 11 months ago

Pinging @elastic/es-search (Team:Search)

thecoop commented 11 months ago

Another one: https://gradle-enterprise.elastic.co/s/xf64wpeoac43s/tests/task/:server:test/details/org.elasticsearch.search.SearchServiceTests/testWaitOnRefreshFailsIfCheckpointNotIndexed?top-execution=1

pmpailis commented 9 months ago

Looked into the test failure, and AFAICT this seems to be purely based on timing. It seems that the specified timeout (getWaitForCheckpointsTimeout) passes, before we reach the expected exception point (where we check the seq.no and throw an IllegalArgumentException due to not having performed so many operations yet). There are 30 such failures in the past 3 months ( https://es-delivery-stats.elastic.dev/app/r/s/USaH0 ) with the following 4 error messages:

The first exception started Jan. 6 and has also been reported and addressed in this issue.

Looking at the exception in the description now, it seems that all but one were with very low timeout thersholds (10-25ms), and we had just one instance with 95ms. I believe that this is just a test-issue and no changes in production code are required. Given that we want to test the seq. no and not the timeout in this case, I'd suggest to simply increase the timeout to more safe bounds (e.g. [100-200] or [200-300]) to isolate this potential issue and focus solely on what we want to actually test. The timeout itself is covered by a different test in SearchServiceTests#testWaitOnRefreshTimeout.

mosche commented 8 months ago

This is still failing for 8.12, see this test run and the history here. It looks like the timeout changes need to be back-ported accordingly.

pmpailis commented 7 months ago

Backported to 8.12 and 8.13.