Open trickyearlobe opened 4 years ago
We have tried to reproduce the issue with two extensions and both as MSI installer by deploying them simultaneously, Azure Chef Client extension was installed without failure after installation of first extension completed. Status of both extension was updated to Azure portal as success.
steps we have taken : Ran command from powershell for Azure Chef extension and deployed an extension through azure portal simultaneously command -
az vm extension set --resource-group "ash-airgap-grp" --vm-name "ayu-win1" --name ChefClient --publisher Chef.Bootstrap.WindowsAzure --version 1210.13.4.1 --no-auto-upgrade true --protected-settings "{'validation_key': '', 'client_key': '', 'client_rb': 'D:\chef\chef-repo-ash\chef-repo\chef-repo\.chef\client.rb'}" --settings "{ 'bootstrap_options': { 'chef_server_url': '', 'chef_node_name': 'ayu-newtest', 'node_ssl_verify_mode': 'none' }, 'runlist': '[recipe[cbk1::default]]', 'CHEF_LICENSE':'accept', 'chef_package_url':'https://storageblobpkgash.blob.core.windows.net/packages/chef-client-16.4.41-1-x64.msi'}"
result -
Azure Chef-Client extension was installed after the other extension got installed in the vm and both registered in Azure portal as well as node is also getting created in chef manage.
We may need some more details to reproduce the issue.
https://github.com/chef-partners/azure-chef-extension/pull/344 may help with diagnosing this issue
@ayushbhatt29 This working for you is a matter of fortunate timing. The Windows Installer does only allows a single instance of InstallExecuteSequence
to be running at a time and other installations will fail instead of wait.
There are a number of ways to poll if Windows Installer is currently active to reduce (but not eliminate) the the likelihood of hitting the race condition. One is looking for the HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\InProgress
registry key (keep the wow64 registry redirector in mind). A more crude one would be to look for a currently executing msiexec
process.
We should add a check for one of these and wait for completion. At least we can output some logging in this case. However there is a real risk that waiting will cause us to exceed our limited time that is allowed to us by the Azure extension framework and fail anyway.
The customer that originally reported this to me used #344 to identify that there was an MSI fight
going on.
The workaround they are using in their Azure DINE policy (Deploy If Not Exist) is to
DINE policy dependencies are not always possible, so we should still make sure that we fix the retry mechanism so it correctly updates the deployed status for Azure framework.
Inconsistency in reporting deployment status when we retry the install/bootstrap is still a problem for them when the deployment fails for other reasons (Chef server offline, network problems etc)
Thanks @btm and @trickyearlobe, we will work on it.
From Richard Nixon: The request to add MSI installer logging was actually implemented, and Aftab managed to diagnose that there was indeed an "MSI fight" going on. UBS have now made AzureChefExtension dependent on the clashing Extension which causes them to install serially (meaning they no longer fight with each other).
We do still need to fix the problem that if the install has to be retried (typically Azure reruns it after 90 mins), the status is not correctly updated to the Azure framework, and looks as if it hasn't deployed correctly.
In addition, Aftab mentioned in https://getchef.zendesk.com/agent/tickets/27765 that the issue occurs in about 20% of cases when the following extensions are configured without the dependencies trick.
There are a number of ways to poll if Windows Installer is currently active to reduce (but not eliminate) the the likelihood of hitting the race condition. One is looking for the HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\InProgress registry key (keep the wow64 registry redirector in mind). A more crude one would be to look for a currently executing msiexec process.
I believe what you are looking for is the _MSIExecute mutex
https://docs.microsoft.com/en-us/windows/win32/msi/-msiexecute-mutex
Msiexec can be running, and not actually holding the mutex. This gets you as close as you can get to avoiding the race condition.
Thanks @caroysMSFT, that's helpful. @ayushbhatt29, looks like we just need to call QueryServiceStatusEx to get the status of the MSI Installer service to avoid an MSI fight.
We still need to make sure we correctly update state info for Azure (important for reporting on deployment state in large estates)
QueryServiceStatusEx was the wrong takeaway from that article. You need to query the named system mutex "_MSIExecute" to see if someone is holding on to it.
This article is a good start
Thanks @caroysMSFT for the suggestion.
We tried recreating the issue and introduced the MSI installer logging using
Test-Path HKLM:\Software\Microsoft\Windows\CurrentVersion\Installer\InProgress
however we were unable to reproduce the issue in hand.
While trying to install the chef extension along with other extensions which require the MSI installer, the installation would succeeded every time & the status of the MSI installer won't log a clash.
I believe we would require some more details on how to recreate this issue.
If Azure Chef Client extension is deployed to a machine along with another extension that needs the MSI installer, the Chef Client Extension fails because the MSI installer is busy.
Wait/Retries are not correctly handled in the extension, but if DINE policy (Deploy if not exists) causes a retry, the client will eventually install, but does not update its status to Azure.