Closed luigidellaquila closed 7 months ago
Pinging @elastic/es-search (Team:Search)
Looked into the test failure, and AFAICT this seems to be purely based on timing. It seems that the specified timeout (getWaitForCheckpointsTimeout
) passes, before we reach the expected exception point (where we check the seq.no and throw an IllegalArgumentException
due to not having performed so many operations yet). There are 30 such failures in the past 3 months ( https://es-delivery-stats.elastic.dev/app/r/s/USaH0 ) with the following 4 error messages:
org.elasticsearch.discovery.MasterNotDiscoveredException: org.elasticsearch.cluster.block.ClusterBlockException: index [index] blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
(18 instances)java.lang.AssertionError: null
(2 instance)junit.framework.AssertionFailedError: Unexpected exception type, expected IllegalArgumentException but got org.elasticsearch.ElasticsearchTimeoutException: Wait for seq_no [1] refreshed timed out
(9 instances)java.lang.Exception: Test abandoned because suite timeout was reached.
(1 instance)The first exception started Jan. 6 and has also been reported and addressed in this issue.
Looking at the exception in the description now, it seems that all but one were with very low timeout thersholds (10-25ms), and we had just one instance with 95ms.
I believe that this is just a test-issue and no changes in production code are required. Given that we want to test the seq. no and not the timeout in this case, I'd suggest to simply increase the timeout to more safe bounds (e.g. [100-200] or [200-300]) to isolate this potential issue and focus solely on what we want to actually test. The timeout itself is covered by a different test in SearchServiceTests#testWaitOnRefreshTimeout
.
Build scan: https://gradle-enterprise.elastic.co/s/tc2hypko6u4fm/tests/:server:test/org.elasticsearch.search.SearchServiceTests/testWaitOnRefreshFailsIfCheckpointNotIndexed
Reproduction line:
Applicable branches: main
Reproduces locally?: No
Failure history: https://es-delivery-stats.elastic.dev/app/dashboards#/view/dcec9e60-72ac-11ee-8f39-55975ded9e63?_g=(refreshInterval:(pause:!t,value:60000),time:(from:now-7d%2Fd,to:now))&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testWaitOnRefreshFailsIfCheckpointNotIndexed'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.search.SearchServiceTests'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium))))
Failure excerpt: