Open eehret opened 2 months ago
I noticed that this also seems to happen when I delete guest assignments via Azure Portal, even if some guest assignments are remaining for the same virtual machine, and in theory the worker should still have work to do.
I'm not 100% sure what causes this, but we've seen it in quite a few Azure virtual machines.
Something seems to be causing gc worker to think that there is nothing left to do and it just stops. All of the existing guest assignments eventually disappear from Azure portal because gc worker is no longer reporting compliance and the guest assignments expire. Even the 'AzureLinuxBaseline' guest assignments will disappear.
Here's what the tail end of the gc_worker.log looks like when this condition gets triggered:
When this happens it is also impossible to restart gcd.service. it hangs, waiting for a process forever.
The only way I've found so far to recover from this, short of a reboot which we can't do whenever we want in a production environment, is to manually issue a kill on the process that it's waiting forever for.
I'm at a loss as to how to further troubleshoot this. If Microsoft would like more information on this or work with me to gather more information, I can make myself available.
In this specific instance the OS was Ubuntu 20.04 LTS (Azure marketplace image from Canonical)
I think I've found a workaround that might be feasible for us until this issue gets properly resolved -- creating an entry in /etc/cron.daily to restart gcd before it has a chance to reach the 'hung' state. So far that seems to have helped on the one VM where I've tried it.