Closed lpalovsky closed 3 weeks ago
I just realized that after failure, if I open VM overview from azure WEB ui, it says: 'Virtual machine agent status not ready'
Hi @lpalovsky , can you try rebooting the machine? And then also rerun the Control Plane Deployment pipeline?
Hi @lpalovsky , can you try rebooting the machine? And then also rerun the Control Plane Deployment pipeline?
Hello @devanshjainms. Thanks for the reply.
I tried to redeploy everything from scratch again today, but this time I used SDAF release v3.11.0.3
. (We agreed with @hdamecharla to use this as stable baseline. )
I experienced the same situation with deployment running for about an hour and then failing with same error as above and seeing the yellow banner in WebUI. After VM restart the banner disappeared but deployment script fails with the same error:
{"@level":"error","@message":"Error: updating Linux Virtual Machine (Subscription: \"<REDACTED>\"\nResource Group Name: \"<REDACTED>\"\nVirtual Machine Name: \"LAB-SECE-DEP05_labsecedep05deploy00\"): polling after Update: context deadline exceeded"
Thanks a lot for your help.
Lumir
Hello, just an update with an observation I made:
At the beginning of the deployment the VM is fine, but after Library is deployed, waagent
service goes into 'dead' state. It is possible to start it normally, but process seems to be still hanging.
● waagent.service - Azure Linux Agent
Loaded: loaded (/usr/lib/systemd/system/waagent.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2024-08-08 08:01:19 UTC; 35min ago
Condition: start condition failed at Thu 2024-08-08 08:01:19 UTC; 35min ago
Main PID: 2837 (code=exited, status=0/SUCCESS)
Aug 08 07:56:37 labsecedep05deploy00 sudo[4229]: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 08 07:58:00 labsecedep05deploy00 sudo[4368]: root : PWD=/var/lib/waagent/custom-script/download/0 ; USER=root ; COMMAND=…b-release
Aug 08 07:58:00 labsecedep05deploy00 sudo[4368]: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 08 07:58:27 labsecedep05deploy00 sudo[4936]: root : PWD=/var/lib/waagent/custom-script/download/0 ; USER=root ; COMMAND=…ive patch
Aug 08 07:58:27 labsecedep05deploy00 sudo[4936]: pam_unix(sudo:session): session opened for user root by (uid=0)
Aug 08 08:01:19 labsecedep05deploy00 systemd[1]: Stopping Azure Linux Agent...
Aug 08 08:01:19 labsecedep05deploy00 python3[2837]: 2024-08-08T08:01:19.795971Z INFO Daemon Daemon Agent WALinuxAgent-2.8.0.11 fo…2.8.0.11
Aug 08 08:01:19 labsecedep05deploy00 systemd[1]: waagent.service: Succeeded.
Aug 08 08:01:19 labsecedep05deploy00 systemd[1]: Stopped Azure Linux Agent.
Aug 08 08:01:19 labsecedep05deploy00 systemd[1]: Condition check resulted in Azure Linux Agent being skipped.
Hint: Some lines were ellipsized, use -l to show in full.
As @KimForss pointed out on today call, it might be an infra glitch, so I will try to deploy in different region. Will come back with an update.
Looks like I should be able to finish the deployer VM configuration using the configure_deployer.sh
script.
Hello. I am closing the issue since I don't experience it anymore. I deployed already at least 2 environments without problems. Might have been a temporary infra issue or something.
Thanks for the update. @lpalovsky
Describe the bug New deployment of control plane fails with an error:
polling after CreateOrUpdate: context deadline
Full message:
The error above happens somewhere in the middle of the deployment but process continues and in the end fails with:
Setup of the local machine: SLES ver. 15SP3 az cli:
terraform:
SDAF version: master branch
To reproduce Steps to reproduce the behavior: