elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.73k stars 24.68k forks source link

[CI] BasicDistributedJobsIT testFailOverBasics failing #103059

Open thecoop opened 10 months ago

thecoop commented 10 months ago

Build scan: https://gradle-enterprise.elastic.co/s/y37ifjzcebahk/tests/:x-pack:plugin:ml:internalClusterTest/org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT/testFailOverBasics

Reproduction line:

./gradlew ':x-pack:plugin:ml:internalClusterTest' --tests "org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT.testFailOverBasics" -Dtests.seed=D6FED1DD34FF1D47 -Dtests.locale=de-AT -Dtests.timezone=Canada/Atlantic -Druntime.java=20

Applicable branches: 7.17

Reproduces locally?: Didn't try

Failure history: Failure dashboard for org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT#testFailOverBasics&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testFailOverBasics'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))

Failure excerpt:

java.lang.AssertionError: expected candidate but was FOLLOWER

  at __randomizedtesting.SeedInfo.seed([D6FED1DD34FF1D47]:0)
  at org.elasticsearch.cluster.coordination.Coordinator.becomeLeader(Coordinator.java:765)
  at org.elasticsearch.cluster.coordination.Coordinator.processJoinRequest(Coordinator.java:711)
  at org.elasticsearch.cluster.coordination.Coordinator.lambda$handleJoinRequest$8(Coordinator.java:594)
  at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:219)
  at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:101)
  at org.elasticsearch.action.support.ListenableActionFuture.executeListener(ListenableActionFuture.java:89)
  at org.elasticsearch.action.support.ListenableActionFuture.addListener(ListenableActionFuture.java:54)
  at org.elasticsearch.cluster.coordination.Coordinator$1.onResponse(Coordinator.java:633)
  at org.elasticsearch.cluster.coordination.Coordinator$1.onResponse(Coordinator.java:630)
  at org.elasticsearch.action.ActionListener$DelegatingActionListener.onResponse(ActionListener.java:186)
  at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1471)
  at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:352)
  at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:340)
  at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  at java.lang.Thread.run(Thread.java:1623)
elasticsearchmachine commented 10 months ago

Pinging @elastic/ml-core (Team:ML)

droberts195 commented 7 months ago

This failure is in cluster coordination code, not specifically related to ML: https://github.com/elastic/elasticsearch/blob/edea203e7cbda918dbc88751a72f174c667b2057/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java#L765

@elastic/es-distributed is this of any interest?

The failure that led to this issue being opened seems like the only ever failure of this test in this particular way, so maybe the expected candidate but was FOLLOWER was a symptom of some one-off problem with the VM doing this particular CI run?