Azure / azure-powershell

Microsoft Azure PowerShell
Other
4.24k stars 3.84k forks source link

Start-AzureRmVM cmdlet stuck in running state for hours #6223

Open square10 opened 6 years ago

square10 commented 6 years ago

Description

I'm trying to turn on a VM via PowerShell but I'm running into an odd issue. I'm using Start-AzureRmVm to start the machine. The command will run with no errors however even after the VM starts the cmdlet never finishes. It's almost like its waiting for a running status that never gets passed. I have to either stop the script using crtl+c or close the shell. While the command is running I can log into the Azure portal and verify that the machine is up and in a running state. I ran into this once before and deleted the VM and started over with a new VM. This worked for awhile but now the same issue is back. I can reproduce this issue on several machines, inside of a runbook, using a code editor like VS Code / ISE and even with the native shell. Has anyone seen this before? The VM is a build from the MS marketplace, I'm using one of the predefined Citrix Netscaler templates.

OSDiskName = netscalerosdisk Publisher = citrix Product = netscalervpx-120 (netscaler10standard)

I opened a case with Microsoft support but they said it's an issue with the cmdlet and I need to open a bug here.

Script/Steps for Reproduction

I can reproduce the issue with the code below. The line Write-Output never runs and the text is never displayed. The shell is "stuck" waiting for the Start-AzureRmVM cmdlet to finish.

[string] $sourceVM = 'vm9'
[string] $sourceResourceGroup = 'RG-Test-Production'

Write-Output "starting source vm $sourceVM"
Start-AzureRmVM -ResourceGroupName $sourceResourceGroup -Name $sourceVM
Write-Output "never writes this line..."

Module Version

ModuleType Version Name ExportedCommands


Script 5.7.0 AzureRM

Get-Module -Name AzureRM -ListAvailable

Environment Data

Name Value


PSVersion 5.1.16299.431 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.16299.431 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1

$PSVersionTable

Debug Output

cormacpayne commented 6 years ago

@hyonholee @nibhat hey guys, have you seen this issue before?

@square10 as a potential workaround, you can provide the -AsJob parameter to run the command as a job in the background of your PowerShell process so the shell isn't locked from you running any additional commands

square10 commented 6 years ago

Thanks @cormacpayne, that allowed the script to continue while the Start-AzureRmVM ran in the background. This is such a strange issue, I can reproduce it on all of my Linux (Citrix Netscaler) vms. So far I haven't seen it with the Windows machines.

You can see from the Get-AzureRmVM results that the provisioning state says "Updating" even though in the portal the VM is ready and the status shows as "Running".

Get-AzureRmVM -ResourceGroupName $sourceResourceGroup -Name $sourceVM

ResourceGroupName : RG-Test-Production Id : /subscriptions/a06fcc63-0592-4dfa-9a40-dcef8bc3b668/resourceGroups/RG-Test-Production/providers/Microsoft.Compute/virtualMachines/vm-test VmId : 75626fc9-c84b-4f09-b6a6-a1a6d7dfba4c Name : vm-test Type : Microsoft.Compute/virtualMachines Location : eastus2 Tags : {} DiagnosticsProfile : {BootDiagnostics} Extensions : {OmsAgentForLinux} HardwareProfile : {VmSize} NetworkProfile : {NetworkInterfaces} OSProfile : {ComputerName, AdminUsername, LinuxConfiguration, Secrets} Plan : {Name, Publisher, Product} ProvisioningState : Updating StorageProfile : {ImageReference, OsDisk, DataDisks}

Current job state:

Id Name PSJobTypeName State HasMoreData Location Command


1 Long Running... AzureLongRun... Running True localhost Start-AzureRmVM

square10 commented 6 years ago

Update on the job status....It took almost an hour for the job to finish.

PS C:\Users\JesseRunowski\OneDrive - Square10 Solutions LLC\Scripts> Get-Job | Select * State : Completed HasMoreData : True Location : localhost StatusMessage : Completed CurrentPSTransaction : Host : System.Management.Automation.Internal.Host.InternalHost Command : Start-AzureRmVM JobStateInfo : Completed Finished : System.Threading.ManualResetEvent InstanceId : 2c1c9d97-1022-4bb3-9bdc-2363bf0d7684 Id : 1 Name : Long Running Operation for 'Start-AzureRmVM' ChildJobs : {} PSBeginTime : 5/16/2018 2:25:52 PM PSEndTime : 5/16/2018 3:16:14 PM PSJobTypeName : AzureLongRunningJob`1 Output : {Microsoft.Azure.Commands.Compute.Models.PSComputeLongRunningOperation} Error : {} Progress : {} Verbose : {} Debug : {[AzureLongRunningJob]: Starting cmdlet execution, setting for cmdlet confirmation required: 'False', [AzureLongRunningJob]: Completing cmdlet execution in RunJob} Warning : {} Information : {}

square10 commented 6 years ago

Hello,

I was wondering if anyone was able to find anything on this issue? I'm going to try creating a new storage account to host the vhd file to see if that helps.

Thanks!

lukaferlez commented 6 years ago

Hi,

same problem here a couple of days ago az vm start started showing strange behaviour like this. Azure Powershell would hang with status running, while the Azure UI would show that the machine is running as well as the fact that I can actually login to the machine.

Is there any workaround for the issue (except. running in background)?

Thanks

square10 commented 6 years ago

@lukaferlez, so far I've been unable to find a fix for this. If I do I'll be sure to update this thread. Glad to see its not just me running into this issue.

square10 commented 6 years ago

Any updates?

dbsanfte commented 6 years ago

This needs more attention, it happens a lot, and it breaks scripts.

maptan commented 6 years ago

Is this the same comandset used on Azure Portal? StartAzureV2 graphical runbook uses this powershell command and I´ve got the same long running status on the job. As a result, the automation account exceded the 500 automation minutes and I´ve had to change the automation account type for a client, to paid. Sometimes it even takes more than 3 hours, and azure kills the job (the VM is started just fine in a few minutes).

square10 commented 6 years ago

Hi @maptan, Yes, this is actually where I first ran into the issue. It was with non-graphical runbook but same issue happened to me. A job that normally took 5-10mins ran for 6 hours. Once I found out what the issue was I started to monitor the VM while the runbook was in progress. I could see the VM was up and running in the Azure portal but the Start-AzureRmVm cmdlet was still running. So far I've only been able to reproduce this issue on non-windows OS VMs.

maptan commented 6 years ago

@square10 In my case it happens with two Windows VMs on Azure. Sometimes with one, sometimes with the other. Can´t find why the error would jump between VMs also. I´ve just setup different schedules at different times for each VMs as a test. I´ll update here if something changes.

grazburya commented 5 years ago

Any more on this? I am facing the same issue with 40+ VMs in a resource group. These are Windows and Linux VMs. My script that should only take 45 mins is taking over 8 hours to complete.

khoi-thinh commented 5 years ago

I've been using StartAzureV2 graphical runbook to start my VMs, there was nothing wrong with the time it took even though it was not parallel. The biggest surprise i got was around 50% of my VM was in running status, but got stuck in Unavailable (from Resource Health) I never seen this behavior before i deployed Runbook (i always start VMs by manually before)

square10 commented 5 years ago

We could reproduce the issue with a single VM using the simple script below.

[string] $sourceVM = 'vm9' [string] $sourceResourceGroup = 'RG-Test-Production' Write-Output "starting source vm $sourceVM" Start-AzureRmVM -ResourceGroupName $sourceResourceGroup -Name $sourceVM Write-Output "script is hung and never writes this line..."

The portal will show the VM as running however the cmdlet continues to run. I can only assume that the cmdlet is waiting for a response code after starting the VM that it never gets. The cmdlet willl eventually time out, in some cases 6-8 hrs. Still to this day I am using the -asJob parameter and letting Azure handle the termination of the job on its on time and not on mine or my client's.

I've been trying to get help from Microsoft on this issue for almost a year now. I haven't received any updates.

patrickhowerter commented 5 years ago

I am having the same issue and it seems to have started when I upgrade to Ubuntu 18. People need to use this for automation tasks. I can't believe no one from Microsoft has been working on this.

notameadow commented 5 years ago

Just pinging here as this still seems to be an issue, was there ever any resolution to this?

square10 commented 5 years ago

Unfortunately no, this is still an issue for us. We continue to start VMs using the -AsJob parameter and then let Azure clean up the job.

I have no clue why MS won't look into this issue.

darshankumarys commented 5 years ago

Hi, Am also facing same issue for few of my machines. Though machines starts, the status still shows starting Virtual Machine in the notification blade in Azure Portal. This results we can't run any scripts in VSTS pipelines.

By when this bug will be removed?

shravani6 commented 4 years ago

Is the azure command Start-AzureRmVm issue solved? I still face this issue and I just came across this conversation. It is very disappointing that azure hasn't fixed this issue.

saiganeshdk commented 4 years ago

Hi,

Also facing the same issue. Its happening with W2012 R2 DC machine also.

clittle1973 commented 4 years ago

I believe this issue may be affecting the WVD-Scaling Script as well.

https://github.com/Azure/RDS-Templates/tree/master/wvd-sh/WVD%20scaling%20script

johnwc commented 4 years ago

Ran into this same issue with Start-AzVM today. Linux VM, console shows as running and we can get into the VM. PSH console just waiting for Start-AzVM to come back as completed.

pgrignaffini commented 4 years ago

Same issue using the Start-AzureV2VMs runbook for automatic start of a Windows VM. After 3 days of working normally, it just suddenly decided to take up to 2 hours to start a single VM, this inconsistency is mesmerizing.

averhaegen commented 4 years ago

I'm also experiencing this issue. Our company is using an Automation Account to automatically start and stop some VMs when outside of business hours. The first VM starts in a few minutes, but the command is stuck for 1 to 2 hours, causing the later machines to be started 1 to 2 hours late. I will try the '-AsJob' workaround now.

harrydlgs commented 3 years ago

I have had the same problem. I don't know exactly what is causing this. But I solved my problem using -AsJob (what makes the script keep running) as a parameter of Start-AzVM and I simulated the initialization time using Start-Sleep -Seconds 30.

I'm not sure it will solve your problem, but it solved mine

ConfortiLuca commented 3 years ago

I still have the problem today. I start a virtual machine with Start-AzVM and the VM starts within 5 minutes from when I run the command, from Azure Portal the state is Running and I can access it with ssh, but the cmdlet remains stuck for 2 hours. While the cmdlet is stuck, I used another terminal to run the Get-AzVM which returns the Provisioning State of the vm "Updating". Is there any solution (except the -AsJob workaround)?

jagadish1620 commented 2 years ago

Hi @square10 , I Know that you have been following this thread for a while. I ran into the same issue today. Unfortunately, I am not able to use -AsJob parameter, as the command after that is failing and the log states :az : ERROR: (Conflict) Run command extension execution is in progress. Please wait for completion before invoking a run command.

Just wanted to check if there are any other updates about this issue.

hoshinokanade commented 1 year ago

Another 2 years later, Start-AzVM still periodically stuck. It is a Windows VM and from the portal it is obviously started. Our scripts from time to time can get stuck for hours. It happens almost 1 out of 3 I wonder why Microsoft cannot reproduce it by themselves for entire 5 years. I doubt if the only workaround left is to stop using Azure.

Start-AzVM @startParameters

OperationId : (hidden)
Status      : Succeeded
StartTime   : 2/5/2023 9:35:36 am
EndTime     : 2/5/2023 11:06:01 am
Error       :
OranguTech commented 1 year ago

There's a good chance it's not just this command/cmdlet, Start-AzPolicyComplianceScan has this issue as well.

kikaomada commented 1 year ago

Just pinging here as this still seems to be an issue, was there ever any resolution to this?

lixaotec commented 5 months ago

+1

hoshinokanade commented 5 months ago

Hi everybody, I finally able to at least solved the puzzle for my own setup.

In my setup, I found my VM was linked to a host pool with scaling plan aggressively killing the VM under a condition. After disabling the scaling plan linked to the VM by another cmdlet, Start-AzureVM cmdlet usually return within minutes, not hours.

Definitely not for everybody, but if your VM does connect to some sort of scaling option, try to disable it before proceeding Start-AzureVM may be a good idea.

seanatcae commented 17 hours ago

Hi, just want to say that.

I have numerous VM acting as custom devops test agents. In a specific pipeline we manage Start and Stop VM operations using the azure cli command:

Start-AzVm

I have seen this command take 5 mins on average, but I did witness 13 mins this morning. What can I do to debug why this start operation is taking so long?

As with others, the status in the portal shows Running at lot quicker.

Update: Using -AsJob and -NoWait, then with a while loop independently checking run status I got around the problem.