Open abhigupta1207 opened 10 months ago
We would need to see your startup scripts in order to troubleshoot this effectively.
That being said, Restart-Service
is notoriously fragile if it races with a service that's already in the process of starting up, which may be the case here. You can try inserting the following line before any Restart-Service
to let it complete startup before attempting to restart:
(Get-Service google-cloud-ops-agent*).WaitForStatus('Running', '00:03:00')
The '00:03:00'
is a timeout so that it doesn't block forever (e.g. in case there's a bad config or some other problem preventing normal startup); adjust this timeout as you need.
Since you're already creating a custom image anyway, another thing you can try instead is change the service startup type for all Ops Agent services to Manual
. This should prevent the startup race and give your script a chance to copy in the configs before it starts up. Then you would call Start-Service
instead of Restart-Service
after copying the configs.
@jefferbrecht Thanks for getting back, we will try the below solution and will update you.
Since you're already creating a custom image anyway, another thing you can try instead is change the service startup type for all Ops Agent services to Manual. This should prevent the startup race and give your script a chance to copy in the configs before it starts up. Then you would call Start-Service instead of Restart-Service after copying the configs.
This issue was marked stale due to lack of activity. It will be closed in 14 days.
NOTE: To get the best support experience for bug fixes, please go to https://cloud.google.com/support-hub and follow the instructions. In comparison, Bug reports filed in this repo only have best effort support, and do not have guaranteed response / resolution SLOs
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
Expected behavior Ops Agent should restart because our autoscaling is based on custom prometheus metrics fetched by ops-agent
Environment (please complete the following information):
INFO 2024-01-16T14:04:35.692183900Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: Restart-Service : Service 'Google Cloud Ops Agent (google-cloud-ops-agent)' cannot be stopped due to the following
INFO 2024-01-16T14:04:35.727365100Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: error: Cannot stop google-cloud-ops-agent-opentelemetry-collector service on computer '.'.
INFO 2024-01-16T14:04:35.743086300Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: At C:\UNICOM\Scripts\Specialize-IntelligenceGoogleOpsAgent.ps1:54 char:5
INFO 2024-01-16T14:04:35.774301700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + Restart-Service google-cloud-ops-agent -Force
INFO 2024-01-16T14:04:35.806250700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: +
~~~~~~~~~INFO 2024-01-16T14:04:35.837901700Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + CategoryInfo : CloseError: (System.ServiceProcess.ServiceController:ServiceController) [Restart-Service
INFO 2024-01-16T14:04:35.870150600Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: ], ServiceCommandException
INFO 2024-01-16T14:04:35.896522Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1: + FullyQualifiedErrorId : CouldNotStopService,Microsoft.PowerShell.Commands.RestartServiceCommand
INFO 2024-01-16T14:04:35.931813400Z [resource.labels.instanceId: 2851173124434244865] windows-startup-script-ps1:
Additional context Add any other context about the problem here. - Google Case 49102632