Open premun opened 4 days ago
This is the first build it started happening in: https://dev.azure.com/dnceng/internal/_build/results?buildId=2566376&view=results
Looking at this now. I wonder if we had some kind of bad commit that's screwing us here, and if just forcefully rolling out would help
Okay I think I see what's going on. The replicas we're starting after a deployments appear to be wrong. This means that the newly deployed replicas always have the status Stopped
after the deployment. So when the deployment is happening, we set the status to stopping
, but a workitem is never finished, so it's never set to Stopped
.
It's a bug that got introduced in https://github.com/dotnet/arcade-services/pull/4072 I think.
Another point is that we shouldn't set the status to Stopping
if we're already Stopped
Also, this is happening in staging too
The question is how are scenario tests passing. It must be some revision that's been on doing all the work
The question is how are scenario tests passing. It must be some revision that's been on doing all the work
Yes this appears to be the case. We start the same revision we try to stop before. So currently, when we deploy, we run the scenario tests on the previous revision
Somehow the service never goes from
Stopping
toStopped
:https://dev.azure.com/dnceng/internal/_build/results?buildId=2566414&view=logs&j=d834f0ef-b202-5dd2-50f7-dc59af38ca7d&t=c5f81511-ed74-5842-0962-8d98850568fa&l=270
Happened now 4 times in a row