Open stu-elastic opened 8 months ago
Pinging @elastic/es-delivery (Team:Delivery)
@breskeby this sounds like it might be related to https://github.com/elastic/elasticsearch/pull/101069. Looking at the cluster logs it looks like we're attempting to start an already started cluster, which would explain the error above. Perhaps the updated logic is losing track of clusters that are used across multiple tasks as is the case for many BWC tests. My guess is some state is getting confused when we upgrade nodes in a cluster.
@mark-vieira From a brief look at the logic we changed and the project in question I couldn't see how that change affected this and wasn't able to reproduce. I'll have another fresh look tomorrow. as indeed it seems related that we see this failure after making the change we did in #101069
@breskeby looks like this is still happening occasionally: https://github.com/elastic/elasticsearch/issues/103839
Another failure today: https://gradle-enterprise.elastic.co/s/izhi63q6ustnw
And another: https://gradle-enterprise.elastic.co/s/6ey6xm4uylriy
Note that this one was a failure of x-pack:plugin:eql:qa:ccs-rolling-upgrade:v8.13.0#oneThirdUpgraded
, though, not the specific test indicated in the issue description. The "failed to obtain node locks" error and stack trace are present, though, so I thought it was fair to attach onto this one.
I ran into this failure in a pr: https://gradle-enterprise.elastic.co/s/lwkluhs5zpwf6/console-log?page=3#L2846 I also noticed that it happened today on the main branch: https://gradle-enterprise.elastic.co/s/e4ca4ihilzigw/console-log?page=2#L1183
Another one today: https://gradle-enterprise.elastic.co/s/coocr6hsiw7ny
We had one in the intake build on 17 March: https://gradle-enterprise.elastic.co/s/a4567iuaplgju/
Here is another intake build failure due to this: https://gradle-enterprise.elastic.co/s/h4gi5trbgx5rk
All three nodes crashed due to failing to obtain locks on their data paths.
I think we want to move this pull request forward. The downside is it'll probably make the test execution a bit slower but I think the improvement in stability is probably worth it. I'll pick this back up.
https://gradle-enterprise.elastic.co/s/l34azxsevqole looks like another instance of this
Another one: https://gradle-enterprise.elastic.co/s/q3tppdm2xbp3m
Same error for :qa:ccs-rolling-upgrade-remote-cluster:v8.15.0#twoThirdUpgraded
CI Link
https://gradle-enterprise.elastic.co/s/augsybdqwff3i
Repro line
:x-pack:plugin:shutdown:qa:rolling-upgrade:v8.12.0-1
Does it reproduce?
Didn't try
Applicable branches
main
Failure history
No response
Failure excerpt