Closed edwardchalstrey1 closed 8 months ago
Possibly related https://github.com/Azure/azure-linux-extensions/issues/1116
https://github.com/Azure/azure-linux-extensions/issues/1116#issuecomment-637677864
I haven't tested this out yet but I believe the issue I was running into may be because of the policy that automatically deployed the OMS Agent to applicable VMs in the subscription. I noticed that when looking at the VMSS instances not able to install extension due to error 52 (marker file) there was already an installation of the OMS agent on the machine and it was pointing to the default workspace. This I believe is because of the Azure policy.
Can we update ./Setup_SRE_Monitoring.ps1
to:
It does seem like that would be a way around this as rerunning the same commands later seems to identify a misconfigured extension and handle it correctly.
Look like we are already doing that in the called function...
Look like we are already doing that in the called function...
@JimMadge do you think it would be sensible to change line 497 foreach ($i in 1..5) {
from just attempting it 5 times to either a very high number or infinite (e.g. with a while
)?
We could move the Setup_SRE_Monitoring.ps1
to be the last step in Deploy_SRE.ps1
(currently it's the penultimate step before Setup_SRE_Backup.ps1
)
No, I don't think so. It might just result in the script running for a very long time or indefinitely.
I'm not entirely convinced that more iterations of the loop will solve the problem. However, it would be worth testing out, with more iterations or a longer wait time between iterations.
From the exception above Enable failed with exit code 52 Couldn't create marker file
and StartTime: 27/01/2023 14:30:16 EndTime: 27/01/2023 14:30:17
suggest it isn't a case of not waiting long enough.
Setup_SRE_Monitoring.ps1
with more iterations of the above loop on an SRE where everything else deployedSetup_SRE_Monitoring.ps1
to the last step of deployment and deploying a new SREThe GitHub issue that @JimMadge linked to mentioned that one possible issue might be that the VM is already attached to an (incorrect) LogAnalytics workspace. Can you see whether this is true for any of the VMs which are error-ing @edwardchalstrey1 ?
No, I don't think so. It might just result in the script running for a long time or indefinitely.
I'm not entirely convinced that more iterations of the loop will solve the problem. However, it would be worth testing out, with more iterations or a longer wait time between iterations.
From the exception above
Enable failed with exit code 52 Couldn't create marker file
andStartTime: 27/01/2023 14:30:16 EndTime: 27/01/2023 14:30:17
suggest it isn't a case of not waiting long enough.
We have the same problem we use the GitHub Virtual Runners for our self-hosted agent pools. If you try to tie in the Azure OMS extension on any Ubuntu distro, you are presented with:
VM has reported a failure when processing extension 'OMSAgentForLinux'. Error message: "Enable failed with exit code 52 Couldn't create marker file"
From what I collected from logs it's a user permission issue; the installation account is unable to write to the location when the oms-script install kicks off. We gave up on having Azure Monitor / Insights on our scale sets. We moved to use another oms agent away from the native Azure.
:white_check_mark: Checklist
4.0.2
at least:computer: System information
:cactus: Powershell module versions
:no_entry_sign: Describe the problem
On deployment of an SRE, we sometimes get the below error message. This can be easily resolved by re-running the
./Setup_SRE_Monitoring.ps1
script (and any subsequent scripts run by./Deploy_SRE.ps1
to be safe), but we don't know why this happens.:deciduous_tree: Error message
:recycle: To reproduce
When running
./Deploy_SRE.ps1
, the above sometimes happens during the call of./Setup_SRE_Monitoring.ps1
Re-running
./Setup_SRE_Monitoring.ps1
works: