Open elasticsearchmachine opened 2 months ago
This has been muted on branch main
Mute Reasons:
Build Scans:
Pinging @elastic/es-distributed (Team:Distributed)
It seems that the shard really stays on the node after the relocation is successful. It seems we get as far as receiving the shard active response from all the other nodes. However, we don't go ahead with the deletion because of this check:
not deleting shard [index1][0], the latest cluster state version[23] is not equal to cluster state before shard active api call [22]
and that seems to be it, we won't try ever again! Not sure what else usually triggers a clean up, that potentially removes this much later (in the test it is the after test clean up that trigger org.elasticsearch.indices.IndicesService#processPendingDeletes
). We could either try to facillitate that extra clean up in the test, or what might be more reasonable is to see why that cs version check is so strict! We should probably at least retry it.
we won't try ever again!
This is not true. We do retry on every cluster state update in IndicesStore
it seems. However, the problem is that we quickly trigger another cluster state update and this can lead to a cycle of newer cluster state updates causing the check above to fail and not delete the shard until we time out.
why that cs version check is so strict
I think the check above needs to be that strict to make sure the shards have not moved and are active where they are since that is the precondition for deleting the local shard store.
Build Scans:
Reproduction Line:
Applicable branches: main
Reproduces locally?: N/A
Failure History: See dashboard&_a=(controlGroupInput:(chainingSystem:HIERARCHICAL,controlStyle:twoLine,ignoreParentSettings:(ignoreFilters:!f,ignoreQuery:!f,ignoreTimerange:!f,ignoreValidations:!t),panels:('0c0c9cb8-ccd2-45c6-9b13-96bac4abc542':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:task.keyword,grow:!t,id:'0c0c9cb8-ccd2-45c6-9b13-96bac4abc542',searchTechnique:wildcard,selectedOptions:!(),singleSelect:!t,title:'Gradle%20Task',width:medium),grow:!t,order:0,type:optionsListControl,width:small),'144933da-5c1b-4257-a969-7f43455a7901':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:name.keyword,grow:!t,id:'144933da-5c1b-4257-a969-7f43455a7901',searchTechnique:wildcard,selectedOptions:!('testCheckShards'),title:Test,width:medium),grow:!t,order:2,type:optionsListControl,width:medium),'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850':(explicitInput:(dataViewId:fbbdc689-be23-4b3d-8057-aa402e9ed0c5,enhancements:(),fieldName:className.keyword,grow:!t,id:'4e6ad9d6-6fdc-4fcc-bf1a-aa6ca79e0850',searchTechnique:wildcard,selectedOptions:!('org.elasticsearch.cluster.PrevalidateShardPathIT'),title:Suite,width:medium),grow:!t,order:1,type:optionsListControl,width:medium)))))
Failure Message:
Issue Reasons:
Note: This issue was created using new test triage automation. Please report issues or feedback to es-delivery.