Closed benwtrent closed 6 months ago
Pinging @elastic/es-data-management (Team:Data Management)
I don't think I've seen this before. The problem is that the write to watcher history was rejected:
1> [2024-03-05T11:59:23,221][ERROR][o.e.x.w.Watcher ] [node_s0] error executing bulk
1> org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of TimedRunnable{original=org.elasticsearch.action.bulk.TransportBulkAction$2/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener/org.elasticsearch.tasks.TaskManager$1{org.elasticsearch.action.support.ContextPreservingActionListener/org.elasticsearch.action.bulk.Retry2$RetryHandler@6ac12a57}{Task{id=246, type='transport', action='indices:data/write/bulk', description='requests[1], indices[.watcher-history-16]', parentTask=unset, startTime=1709639963220, startTimeNanos=989696122872}}/org.elasticsearch.action.support.TransportAction$$Lambda/0x00007f8e84aa7b30@601084c9/org.elasticsearch.action.bulk.TransportBulkAction$$Lambda/0x00007f8e84e10000@7877bfed, creationTimeNanos=989696235664, startTimeNanos=0, finishTimeNanos=-1, failedOrRejected=false} on TaskExecutionTimeTrackingEsThreadPoolExecutor[name = node_s0/write, queue capacity = 1, task execution EWMA = 10.7ms, total task execution time = 190ms, org.elasticsearch.common.util.concurrent.TaskExecutionTimeTrackingEsThreadPoolExecutor@7ed8b40b[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 11]]
1> at org.elasticsearch.common.util.concurrent.EsRejectedExecutionHandler.newRejectedException(EsRejectedExecutionHandler.java:51) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:35) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:841) ~[?:?]
1> at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1376) ~[?:?]
1> at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:72) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.bulk.TransportBulkAction.forkAndExecute(TransportBulkAction.java:294) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.bulk.TransportBulkAction.ensureClusterStateThenForkAndExecute(TransportBulkAction.java:289) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:248) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:84) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:96) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
1> at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:68) ~[elasticsearch-8.13.0-SNAPSHOT.jar:8.13.0-SNAPSHOT]
...
I'm not sure yet why that would be though.
The watcher history was rejected because we set the write queue size to 1, and I assume something else was in there. I don't think that this is the point of the test (it's desired behavior that we immediately reject writes if the write queue is full).
I observed this error after the fix from #106134 was merged: https://gradle-enterprise.elastic.co/s/y6wmgil4wjfda/tests/task/:x-pack:plugin:watcher:internalClusterTest/details/org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests/testHistoryOnRejection?top-execution=1
I've remuted this test since the above failure reproduces for me.
I can see in the logs that a rejection did occur. However, the history index seems to ALSO have rejected the write?
Build scan: https://gradle-enterprise.elastic.co/s/xjimegducgx5i/tests/:x-pack:plugin:watcher:internalClusterTest/org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests/testHistoryOnRejection
Reproduction line:
Applicable branches: 8.13
Reproduces locally?: No
Failure history: Failure dashboard for
org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests#testHistoryOnRejection
&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testHistoryOnRejection'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))Failure excerpt: