Azure / azure-linux-extensions

Linux Virtual Machine Extensions for Azure
Apache License 2.0
304 stars 252 forks source link

Azure Backup job is failing to unfreeze Ubuntu OS after running safefreeze #1868

Open Mi1an opened 7 months ago

Mi1an commented 7 months ago

Hello,

this is second time (of 3 times total) when we encountered an error with fsfreeze during Azure Backup. Our system is during the Azure Backup freezed but it's never unfreezed. The whole server is stuck and the only thing we can do is deallocate Azure VM and start it again.

When we checked logs we see that the last thing logged into the extension.log is run of a procedure safefreeze and proceed for accepting singal. Raw log: 2024-01-14 22:16:48.792020 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]PreSnapshot: Status Code: 200 2024-01-14 22:16:48.794220 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]Taking Snapshot through Host 2024-01-14 22:16:48.796446 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]T:S freeze, timeout value 60 2024-01-14 22:16:48.798717 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]skipped mount : 2024-01-14 22:16:48.800908 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]fsfreeze mount :/mnt 2024-01-14 22:16:48.803464 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]fsfreeze mount :/ 2024-01-14 22:16:48.805675 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]skip freeze is : False 2024-01-14 22:16:48.811102 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]arg : ['/var/lib/waagent/Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0.9207.0/main/safefreeze/bin/safefreeze', '60', '/'] 2024-01-14 22:16:48.813571 [Microsoft.Azure.RecoveryServices.VMSnapshotLinux-1.0]proceeded for accepting signals

In Azure Backup job report we see the job as failed with these details: ExtensionOperationInProgress Command execution failed. Another operation is in progress on this item. Please wait until the previous operation is completed.

We set everything correct like this documentation states: https://learn.microsoft.com/en-us/azure/backup/backup-azure-linux-app-consistent

We are unable to identify the core of the issue deeper. We assume that it's a bug because the system is never unfreezed as it should be. We can't affect this behavior because it's initialized from Azure Backup agent job.

Our OS: Ubuntu 18.04.6 LTS

VM agent version: 2.7.3.0

katatohuk commented 4 months ago

we have the same issue, have been in conversation with MS AZure Backup Team for a few month already... nothing so far. All we know it's fsfreeze which cause the issue and VM goes bananas... the main symptom is avg high load just go crazy.