elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.33k stars 24.87k forks source link

[CI] XPackRestIT test {p0=ml/forecast/Test forecast unknown job} failing #116150

Open elasticsearchmachine opened 2 weeks ago

elasticsearchmachine commented 2 weeks ago

Build Scans:

Reproduction Line:

./gradlew ":x-pack:plugin:yamlRestTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/forecast/Test forecast unknown job}" -Dtests.seed=F1BDE02137EABDC7 -Dtests.locale=smn -Dtests.timezone=Europe/Uzhgorod -Druntime.java=23

Applicable branches: main

Reproduces locally?: N/A

Failure History: See dashboard&_a=(controlGroupState:(initialChildControlState:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:task.keyword,order:0,selectedOptions:!(),title:'GradleTask',type:optionsListControl),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:className.keyword,order:1,selectedOptions:!(org.elasticsearch.xpack.test.rest.XPackRestIT),title:'Suite',type:optionsListControl),'144933da-5c1b-4257-a969-7f43455a7901':(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,fieldName:name.keyword,order:2,selectedOptions:!(test%20%7Bp0%3Dml%2Fforecast%2FTest%20forecast%20unknown%20job%7D),title:'Test',type:optionsListControl)))))

Failure Message:

org.junit.TestCouldNotBeSkippedException: Test could not be skipped due to other failures

Issue Reasons:

Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

elasticsearchmachine commented 2 weeks ago

This has been muted on branch main

Mute Reasons:

Build Scans:

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/ml-core (Team:ML)

davidkyle commented 2 weeks ago

The failure is due to an assertion in the logs taking down the node

[2024-11-01T02:46:22,021][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [yamlRestTest-0] fatal error in thread [elasticsearch[yamlRestTest-0][system_critical_write][T#3]], exiting
java.lang.AssertionError: null
    at org.elasticsearch.index.mapper.IgnoredSourceFieldMapper.postParse(IgnoredSourceFieldMapper.java:161) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:190) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:136) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:113) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:1043) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:984) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:928) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:378) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:237) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:305) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:153) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:80) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:220) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
    at java.lang.Thread.run(Thread.java:1575) ~[?:?]

yamlRestTest.log

elasticsearchmachine commented 2 weeks ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es commented 2 weeks ago

@davidkyle thanks for looking. The stack trace above seems like an issue with synthetic source indeed.

I just synced and can't reproduce the issue. Did you just use the command above, in main? If it doesn't reproduce any more, I'm tempted to unmute and see if it'll come back.

kkrik-es commented 2 weeks ago

Btw the first failure link above points to a different error:

REPRODUCE WITH: ./gradlew ":x-pack:plugin:yamlRestTest" --tests "org.elasticsearch.xpack.test.rest.XPackRestIT.test {p0=ml/forecast/Test forecast unknown job}" -Dtests.seed=F1BDE02137EABDC7 -Dtests.locale=smn -Dtests.timezone=Europe/Uzhgorod -Druntime.java=23

XPackRestIT > test {p0=ml/forecast/Test forecast unknown job} FAILED
    org.junit.TestCouldNotBeSkippedException: Test could not be skipped due to other failures
        at org.junit.runners.model.MultipleFailureException.<init>(MultipleFailureException.java:36)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1014)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:48)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1575)

        Caused by:
        org.junit.AssumptionViolatedException: [ml/forecast/Test forecast unknown job] skipped, reason: [https://github.com/elastic/elasticsearch/issues/34747]

The second link is from https://github.com/elastic/elasticsearch/pull/116049 that has changes for synthetic source, so it's irrelevant.

kkrik-es commented 2 weeks ago

Assigning back to @davidkyle since this doesn't seem like an issue with synthetic source. Please assign back to me if a failure outside a PR or another repro pointing to a parsing exception.

davidkyle commented 2 weeks ago

Thanks for the investigation @kkrik-es

davidkyle commented 2 weeks ago

The failing test is actually muted, the TestCouldNotBeSkippedException means that the error occurred either in the test setup or teardown. In this case it is a search_phase_execution_exception in the teardown.

    org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:35139], URI [/_ml/trained_models/_stats?size=10000], status line [HTTP/1.1 500 Internal Server Error]  
    {"error":{"root_cause":[],"type":"exception","reason":"Searching for stats for models [lang_ident_model_1] failed","caused_by":{"type":"search_phase_execution_exception","reason":"","phase":"query","grouped":true,"failed_shards":[],"caused_by":{"type":"search_phase_execution_exception","reason":"Search rejected due to missing shards [[.ml-stats-000001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.","phase":"query","grouped":true,"failed_shards":[]}}},"status":500} 
        at app//org.elasticsearch.client.RestClient.convertResponse(RestClient.java:351)    
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:317) 
        at app//org.elasticsearch.client.RestClient.performRequest(RestClient.java:292) 
        at app//org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.deleteAllTrainedModelIngestPipelines(MlRestTestStateCleaner.java:43) 
        at app//org.elasticsearch.xpack.core.ml.integration.MlRestTestStateCleaner.resetFeatures(MlRestTestStateCleaner.java:34)    
        at app//org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.clearMlState(AbstractXPackRestTest.java:138)    
        at app//org.elasticsearch.xpack.test.rest.AbstractXPackRestTest.cleanup(AbstractXPackRestTest.java:118)