Closed arpparker closed 4 years ago
I am unable to reproduce this, but I did experience this some time ago (maybe a year), but found a workaround, using -DefaultProfile
, (I turned off some of the debug output, and changed it so that I only get an error back (if any)
This was mostly in Azure Function App that uses runspaces, so your mileage may vary.
I actually just yesterday wanted to write a blogpost on this and found that I did not need to use -DefaultProfile
anymore. I did a loop of 50 iterations queuing 5 jobs in each, and got this error 2-5 times each.
I worked with runspaces the entire day and did not see this error once.
Try below out and see if that makes a difference.
$Global:ErrorActionPreference = 'Stop'
$DebugPreference = 'SilentlyContinue'
$AzContext = Get-AzContext
$Jobs = @()
$Jobs += Start-Job -ArgumentList $AzContext -ScriptBlock {
param($AzContext)
# $DebugPreference = 'Continue'
try {
$null = Get-AzVm -DefaultProfile $AzContext
}
catch {
$_
}
}
$Jobs += Start-Job -ArgumentList $AzContext -ScriptBlock {
param($AzContext)
# $DebugPreference = 'Continue'
try {
$null = Get-AzVm -DefaultProfile $AzContext
}
catch {
$_
}
}
$Jobs | Wait-Job | Receive-Job
@spaelling I actually am using -DefaultProfile (technically -AzContext, but that is an alias for -DefaultProfile) in the actual script that I observed this issue on and it still fails in the same way. I can consistently reproduce this issue both in my actual production script and in my example script above. My steps for reproduction are just a simpler way to illustrate what I'm doing in the actual script (which is using Set-AzVmExtension to join a VM to a domain).
EDIT: Looking at this again, since I am using -DefaultProfile in my actual script, it was an oversight leaving it out in my reproduction steps above. So much so, it's pointless passing in $AzContext if I'm not referencing it anywhere! :) I'm going to try reproducing this again using -DefaultProfile, but I suspect the results will be the same since I can reproduce the same issue in my actual production script (which already uses -DefaultProfile).
EDIT2: I've added -AzContext to the Get-AzVm commands in the reproduction steps above.
@spaelling I actually am using -DefaultProfile (technically -AzContext, but that is an alias for -DefaultProfile) in the actual script that I observed this issue on and it still fails in the same way. I can consistently reproduce this issue both in my actual production script and in my example script above. My steps for reproduction are just a simpler way to illustrate what I'm doing in the actual script (which is using Set-AzVmExtension to join a VM to a domain).
EDIT: Looking at this again, since I am using -DefaultProfile in my actual script, it was an oversight leaving it out in my reproduction steps above. So much so, it's pointless passing in $AzContext if I'm not referencing it anywhere! :) I'm going to try reproducing this again using -DefaultProfile, but I suspect the results will be the same since I can reproduce the same issue in my actual production script (which already uses -DefaultProfile).
EDIT2: I've added -AzContext to the Get-AzVm commands in the reproduction steps above.
I would assume that if -DefaultProfile
is not explicitely specified in the call it will do the equivalent of Get-AzContext
, which may be what then fails, ie. no context is available inside the job.
But again, I am seeing the same as you, although rarely (<2% of calls) when using Start-Job
or equivalent.
Maybe someone has an idea on how to troubleshoot this. Can you reproduce on a vanilla VM (some Windows image from Azure marketplace)
I have a repro for this I think... Here's my code:
$scriptBlock = { $jobs = @() for ($i = 0; $i -lt 10; $i++) { $jobs += Start-Job -ScriptBlock { $rg = $(Get-AzResourceGroup).Count if (-not $rg) { Write-Error "Hit an issue..." } else { Write-Output "No problem..." } } } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs } } $jobs = @() for ($i = 0; $i -lt 5; $i++) { $jobs += Start-Job -ScriptBlock $scriptBlock } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs }
When I run that code - I get the following results:
It basically means that there is some intermittent issue in retrieving the profile. I'm seeing this running Pester tests in parallel (with Start-Job) that rely on the Az module. Anytime I use "-AsJob" in a commandlet I see intermittent failures.
Can't seem to get my code formatted correctly in the prior comment, so attaching it here... Start-JobAzIssue.ps1.txt
Put the code within ```powershell code here ```
$scriptBlock = {
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock {
$rg = $(Get-AzResourceGroup).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 5; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
So what you are doing is basically this
(1..5 | % {Start-Job -ScriptBlock {
(1..10 | % {Start-Job -ScriptBlock {
$null = Get-AzResourceGroup -ErrorAction Stop
}}) | Wait-Job | Receive-Job
}}) | Wait-Job | Receive-Job
That is nesting jobs in jobs. Above is testing the same 50 times, so that fits fairly well with my observed error rate of 2%, ie. 1 in 50 will fail.
You can get a debug trace from when it fails like this
(1..5 | % {Start-Job -ScriptBlock {
(1..10 | % {Start-Job -ScriptBlock {
$DebugPreference = 'Continue'
$Path = "$($env:TEMP)\20062019_$([guid]::NewGuid().Guid).txt"
try {
$null = Get-AzResourceGroup -ErrorAction Stop 5>&1 > $Path
# Remove file if it did not fail
Remove-Item $Path
}
catch {
Write-Host "Failed, written debug to $Path. Error was $_"
notepad $Path
}
$DebugPreference = 'SilentlyContinue'
}}) | Wait-Job | Receive-Job
}}) | Wait-Job | Receive-Job
Maybe it is just me, but it seems to fail more often when done like this. The debug trace I am getting is this
7:48:28 AM - GetAzureResourceGroupCmdlet begin processing with ParameterSet 'GetByResourceGroupName'.
7:48:28 AM - using account id '*******************'...
[Common.Authentication]: Authenticating using Account: '*******************', environment: 'AzureCloud', tenant: '************************'
[Common.Authentication]: Authenticating using configuration values: Domain: '************************', Endpoint:
'https://login.microsoftonline.com/', ClientId: '1950a258-227b-4e31-a9cf-717495945fc2', ClientRedirect: 'urn:ietf:wg:oauth:2.0:oob', ResourceClientUri:
'https://management.core.windows.net/', ValidateAuthority: 'True'
[Common.Authentication]: Acquiring token using context with Authority 'https://login.microsoftonline.com/************************/',
CorrelationId: '00000000-0000-0000-0000-000000000000', ValidateAuthority: 'True'
[Common.Authentication]: Acquiring token using AdalConfiguration with Domain: '************************', AdEndpoint:
'https://login.microsoftonline.com/', ClientId: '1950a258-227b-4e31-a9cf-717495945fc2', ClientRedirectUri: urn:ietf:wg:oauth:2.0:oob
[ADAL]: Information: 2019-06-20T05:48:28.9565452Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: ADAL PCL.Desktop with assembly version '3.19.2.6005',
file version '3.19.50302.0130' and informational version '2a8bec6c4c76d0c1ef819b55bdc3cda2d2605056' is running...9
[ADAL]: Information: 2019-06-20T05:48:28.9565452Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: ADAL PCL.Desktop with assembly version '3.19.2.6005',
file version '3.19.50302.0130' and informational version '2a8bec6c4c76d0c1ef819b55bdc3cda2d2605056' is running...
[ADAL]: Information: 2019-06-20T05:48:28.9575461Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: === Token Acquisition started:
CacheType: null
Authentication Target: User
, Authority Host: login.microsoftonline.com
[ADAL]: Information: 2019-06-20T05:48:28.9575461Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: === Token Acquisition started:
Authority: https://login.microsoftonline.com/************************/
Resource: https://management.core.windows.net/
ClientId: 1950a258-227b-4e31-a9cf-717495945fc2
CacheType: null
Authentication Target: User
[ADAL]: Verbose: 2019-06-20T05:48:30.6175446Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Loading from cache.
[ADAL]: Verbose: 2019-06-20T05:48:30.6175446Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Loading from cache.
[ADAL]: Verbose: 2019-06-20T05:48:30.6435470Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.6445464Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7415426Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7415426Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7415426Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.7415426Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7425438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7475447Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7485438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7485438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Verbose: 2019-06-20T05:48:30.7485438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Looking up cache for a token...
[ADAL]: Information: 2019-06-20T05:48:30.7485438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Information: 2019-06-20T05:48:30.7485438Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No matching token was found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7615453Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No token matching arguments found in the cache
[ADAL]: Verbose: 2019-06-20T05:48:30.7625477Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: No token matching arguments found in the cache
[ADAL]: Error: 2019-06-20T05:48:30.7725451Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs: Exception type:
Microsoft.IdentityModel.Clients.ActiveDirectory.AdalSilentTokenAcquisitionException, ErrorCode: failed_to_acquire_token_silently
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenSilentHandler.SendTokenRequestAsync()
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.<CheckAndAcquireTokenUsingBrokerAsync>d__59.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.<RunAsync>d__57.MoveNext()
[ADAL]: Error: 2019-06-20T05:48:30.7725451Z: 45b5b5a9-4b53-4036-94a8-d695213153bb - LoggerBase.cs:
Microsoft.IdentityModel.Clients.ActiveDirectory.AdalSilentTokenAcquisitionException: Failed to acquire token silently as no token was found in the cache. Call
method AcquireToken
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenSilentHandler.SendTokenRequestAsync()
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.<CheckAndAcquireTokenUsingBrokerAsync>d__59.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.<RunAsync>d__57.MoveNext()
ErrorCode: failed_to_acquire_token_silently
[Common.Authentication]: Received exception Failed to acquire token silently as no token was found in the cache. Call method AcquireToken, while
authenticating.
Can't seem to get my code formatted correctly in the prior comment, so attaching it here... Start-JobAzIssue.ps1.txt
This is fantastic, I can reproduce the issue with this consistently as well. I did modify it slightly to include passing the context as a parameter to the job, and then passing the context to the Get-AzResourceGroup command. This matches how I was encountering the issue originally, but I can get it to fail both ways, so it may not be important. I've included the full code below with my changes for reference.
But overall, this is absolutely perfect--this is definitely the best way to reproduce this issue at this point. Seems my suspicion it was time-related is unfounded. I do find it interesting though that I would generally encounter it on the first job of the day after having not run anything for 24-48 hours. Seems immaterial now, but I do find it odd.
Should I replace my reproduction steps in the original post with the code @petehauge has provided?
As referenced above, here's the full code with my small modifications:
$AzContext = Get-AzContext
$scriptBlock = {
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ArgumentList $AzContext -ScriptBlock {
param($AzContext)
$rg = $(Get-AzResourceGroup -AzContext $AzContext).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 5; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
Can't seem to get my code formatted correctly in the prior comment, so attaching it here... Start-JobAzIssue.ps1.txt
This is fantastic, I can reproduce the issue with this consistently as well. I did modify it slightly to include passing the context as a parameter to the job, and then passing the context to the Get-AzResourceGroup command. This matches how I was encountering the issue originally, but I can get it to fail both ways, so it may not be important. I've included the full code below with my changes for reference.
But overall, this is absolutely perfect--this is definitely the best way to reproduce this issue at this point. Seems my suspicion it was time-related is unfounded. I do find it interesting though that I would generally encounter it on the first job of the day after having not run anything for 24-48 hours. Seems immaterial now, but I do find it odd.
Should I replace my reproduction steps in the original post with the code @petehauge has provided?
As referenced above, here's the full code with my small modifications:
$AzContext = Get-AzContext $scriptBlock = { $jobs = @() for ($i = 0; $i -lt 10; $i++) { $jobs += Start-Job -ArgumentList $AzContext -ScriptBlock { param($AzContext) $rg = $(Get-AzResourceGroup -AzContext $AzContext).Count if (-not $rg) { Write-Error "Hit an issue..." } else { Write-Output "No problem..." } } } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs } } $jobs = @() for ($i = 0; $i -lt 5; $i++) { $jobs += Start-Job -ScriptBlock $scriptBlock } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs }
The code I provided shows the debug trace, but they will read the entire thread and work from that. Maybe @markcowl or @cormacpayne can comment on this?
And just to repeat a comment I made earlier, this seems not to be a problem when using runspaces, but perhaps similar testing should be done before clearing runspaces entirely.
I see the repro randomly in my code. We connect-AzAccount from powershell azure funcitons. Here are what I am doing in our ps azure function:
We randomly see below error at step 3. The error is gone when rerun the function.
Your Azure credentials have not been set up or have expired, please run Connect-AzAccount to set up your Azure credentials
Before it get fixed, is there any workaround for this error?
@bingbing8 @spaelling @arpparker The likely culprit here is an issue with the type converter in the job. The type converter is used in this case because the type of the cmdlet parameter is IAzureContextContainer
, rather than IAzureContext
To work around the issue, you can pass in an IAzureContextContainer. In a tpical azure environment, this would mean passint the results of running Connect-AzAccount
rather than of running Get-AzContext
so
$context = Connect-AzAccount -Subscription "My Subscription" -Tenant xxxx-xxxxxx-xxxxxx-yyyyy
$job = Start-Job -ArgumentList $context -ScriptBlock {param($AzContext) Get-AzVm -AzContext $AzContext ...}
@markcowl, below is the code we run. I didn't pass in the result of Get-AzContext. It fails randomly.
$appSecret = RetrieveSecretFromKV -SecretName $SPNId -KeyVaultName $KeyVaultName
$kvSecretBytes = [System.Convert]::FromBase64String($appSecret)
$certificate=[System.Security.Cryptography.X509Certificates.X509Certificate2]::new($kvSecretBytes, $null, [System.Security.Cryptography.X509Certificates.X509KeyStorageFlags]::MachineKeySet)
$thumbprint=$certificate.Thumbprint
Install-Certificate -Certificate $certificate -StorePath "Cert:\CurrentUser\My"
Write-Host "Connect-AzAccount..."
$Script:AzContext = Connect-AzAccount -CertificateThumbprint $thumbprint -ApplicationId $SPNID -Tenant $TenantID -ServicePrincipal -Environment $AzureEnvironmentName -SubscriptionId $SubscriptionId
get-AzVM -DefaultProfile $Script:AzContext
Note that when run this concurrently in multiple instances of queue triggered azure ps function, it randomly failed. Most time, the first trigger fail (either after pushed new changes or did not run it for long time, like 24 hours), the second time after the first failures would work fine no matter how many instances are running.
@bingbing8 @spaelling @arpparker The likely culprit here is an issue with the type converter in the job. The type converter is used in this case because the type of the cmdlet parameter is
IAzureContextContainer
, rather thanIAzureContext
To work around the issue, you can pass in an IAzureContextContainer. In a tpical azure environment, this would mean passint the results of running
Connect-AzAccount
rather than of runningGet-AzContext
so
$context = Connect-AzAccount -Subscription "My Subscription" -Tenant xxxx-xxxxxx-xxxxxx-yyyyy $job = Start-Job -ArgumentList $context -ScriptBlock {param($AzContext) Get-AzVm -AzContext $AzContext ...}
I tried this with running 5 jobs, each with 10 nested jobs, and still got an error. I think even more than usual, but could just be coincidental.
$AzContext = Connect-AzAccount -Tenant 'TENANTID' -Subscription 'SUBSCRIPTIONID'
$scriptBlock = {
param($AzContext)
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ArgumentList $AzContext -ScriptBlock {
param($AzContext)
# make sure this is not $null (will then grab from Get-AzContext)
if($null -eq $AzContext)
{
throw "Azure context is '`$null'"
}
$rg = $(Get-AzResourceGroup -AzContext $AzContext).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs (NESTED) to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 5; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock -ArgumentList $AzContext
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
@bingbing8 @spaelling @arpparker The likely culprit here is an issue with the type converter in the job. The type converter is used in this case because the type of the cmdlet parameter is
IAzureContextContainer
, rather thanIAzureContext
To work around the issue, you can pass in an IAzureContextContainer. In a tpical azure environment, this would mean passint the results of runningConnect-AzAccount
rather than of runningGet-AzContext
so$context = Connect-AzAccount -Subscription "My Subscription" -Tenant xxxx-xxxxxx-xxxxxx-yyyyy $job = Start-Job -ArgumentList $context -ScriptBlock {param($AzContext) Get-AzVm -AzContext $AzContext ...}
I tried this with running 5 jobs, each with 10 nested jobs, and still got an error. I think even more than usual, but could just be coincidental.
Ha, you beat me to this by like 10 minutes! :) Was just coming to post the same, I'm getting the same results. I think I have found a way workaround though based on one that was posted in a similar issue that I linked in the original post. More to come in a few minutes...
The following, based essentially completely from this post, has worked 100% of the time for me:
Save-AzContext -Path "C:\Temp\AzContext.json" -Force
$scriptBlock = {
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock {
#Clear-AzContext -Force | Out-Null
Disable-AzContextAutosave -Scope Process | Out-Null
Import-AzContext -Path "C:\Temp\AzContext-Empty.json" | Out-Null
Import-AzContext -Path "C:\Temp\AzContext.json" | Out-Null
$AzContext = Get-AzContext
$rg = $(Get-AzResourceGroup -AzContext $AzContext).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 5; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
A couple notes:
I don't really understand why this works, but it does seem to work. What are the potential security implications of saving the context to disk?
Disable-AzContextAutosave -Scope Process | Out-Null
The following, based essentially completely from this post, has worked 100% of the time for me:
Save-AzContext -Path "C:\Temp\AzContext.json" -Force $scriptBlock = { $jobs = @() for ($i = 0; $i -lt 10; $i++) { $jobs += Start-Job -ScriptBlock { #Clear-AzContext -Force | Out-Null Disable-AzContextAutosave -Scope Process | Out-Null Import-AzContext -Path "C:\Temp\AzContext-Empty.json" | Out-Null Import-AzContext -Path "C:\Temp\AzContext.json" | Out-Null $AzContext = Get-AzContext $rg = $(Get-AzResourceGroup -AzContext $AzContext).Count if (-not $rg) { Write-Error "Hit an issue..." } else { Write-Output "No problem..." } } } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs } } $jobs = @() for ($i = 0; $i -lt 5; $i++) { $jobs += Start-Job -ScriptBlock $scriptBlock } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs }
A couple notes:
* The Clear-AzContext doesn't seem to be required (at least for this particular issue) * If Disable-AzContextAutosave isn't present, the error "The process cannot access the file 'C:...\TokenCache.dat' because it is being used by another process." appears multiple times. * An empty context file can be created by saving the context when there is no context (also mentioned in the post I linked above).
I don't really understand why this works, but it does seem to work. What are the potential security implications of saving the context to disk?
The implication is that you are committing the access-token to disk. That can be problematic as someone could potentially elevate their access in Azure by having access to this file.
I find it odd that you have to import an empty context. I have done the same as you, but passing the access-token and an accountid along. But this still fails, which is odd. It should be pretty equivalent of logging in as a service principal.
<#
.SYNOPSIS
Get cachec access token
.DESCRIPTION
Get cachec access token
.EXAMPLE
An example
.NOTES
This will fail if multiple accounts are logged in (to the same tenant?), check with Get-AzContext -ListAvailable, there should be only one listed
Remove accounts using Disconnect-AzAccount
#>
function Get-AzCachedAccessToken()
{
$ErrorActionPreference = 'Stop'
if(-not (Get-Module Az.Accounts)) {
Import-Module Az.Accounts
}
$azProfile = [Microsoft.Azure.Commands.Common.Authentication.Abstractions.AzureRmProfileProvider]::Instance.Profile
if(-not $azProfile.Accounts.Count) {
Write-Error "Ensure you have logged in before calling this function."
}
$currentAzureContext = Get-AzContext
$profileClient = New-Object Microsoft.Azure.Commands.ResourceManager.Common.RMProfileClient($azProfile)
Write-Debug ("Getting access token for tenant" + $currentAzureContext.Tenant.TenantId)
$token = $profileClient.AcquireAccessToken($currentAzureContext.Tenant.TenantId)
$token.AccessToken
}
$Token = Get-AzCachedAccessToken
$AccountId = (Get-AzContext).Account.Id
#Connect-AzAccount -AccessToken $Token -AccountId $AccountId
cls
$scriptBlock = {
param($Token, $AccountId)
$jobs = @()
for ($i = 0; $i -lt 5; $i++) {
$jobs += Start-Job -ArgumentList $Token, $AccountId -ScriptBlock {
param($Token, $AccountId)
Disable-AzContextAutosave -Scope Process | Out-Null
$AzContext = Connect-AzAccount -AccessToken $Token -AccountId $AccountId -Scope Process -ErrorAction SilentlyContinue
if($null -eq $Token -or $null -eq $AzContext)
{
throw "Azure Token/Context is '`$null'"
}
$rg = $(Get-AzResourceGroup -AzContext $AzContext -ErrorAction SilentlyContinue).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs (NESTED) to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock -ArgumentList $Token, $AccountId
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
@spaelling, it looks like your workaround works when the login in different jobs are same context so you can write/read the context from disk. For my case, we login to different tenant with different subscription in different function. It does not work that way.
I tried all the workarounds mentioned here, but doesn't seem to be working anything.. I will try to revert to older version of Az powershell and will update here...
I've been struggling with this for a week trying to upgrade our existing deployment scripts from AzureRM to Az. None of the workarounds posted here are working for us. Neither exporting/importing the context to a file nor passing the context to the scriptblock works. We make multiple calls to azure endpoints within the scriptblocks in parallel and we won't be able to finish migrating to Az until this works.
I had hoped that runspaces would not have this issue, but alas, the same.
cls
<#
.SYNOPSIS
Get cachec access token
.DESCRIPTION
Get cachec access token
.EXAMPLE
An example
.NOTES
This will fail if multiple accounts are logged in (to the same tenant?), check with Get-AzContext -ListAvailable, there should be only one listed
Remove accounts using Disconnect-AzAccount
#>
function Get-AzCachedAccessToken()
{
$ErrorActionPreference = 'Stop'
if(-not (Get-Module Az.Accounts)) {
Import-Module Az.Accounts
}
$azProfile = [Microsoft.Azure.Commands.Common.Authentication.Abstractions.AzureRmProfileProvider]::Instance.Profile
if(-not $azProfile.Accounts.Count) {
Write-Error "Ensure you have logged in before calling this function."
}
$currentAzureContext = Get-AzContext
$profileClient = New-Object Microsoft.Azure.Commands.ResourceManager.Common.RMProfileClient($azProfile)
Write-Debug ("Getting access token for tenant" + $currentAzureContext.Tenant.TenantId)
$token = $profileClient.AcquireAccessToken($currentAzureContext.Tenant.TenantId)
$token.AccessToken
}
$Token = Get-AzCachedAccessToken
$AccountId = (Get-AzContext).Account.Id
$scriptBlock = {
param($Token, $AccountId, $j, $Runs)
# Write-Host "Inner: Setting up runspaces"
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$sessionstate.Variables.Add(
(New-Object System.Management.Automation.Runspaces.SessionStateVariableEntry('Results', $Results, $null))
)
$runspacepool = [runspacefactory]::CreateRunspacePool(1, [int]$env:NUMBER_OF_PROCESSORS+1, $sessionstate, $Host)
$runspacepool.Open()
$runspaces = @()
# Write-Host "Inner: Invoking $Runs runspaces"
for ($i = 0; $i -lt $Runs; $i++) {
$runspace = [powershell]::Create()
$runspace.RunspacePool = $runspacepool
$runspace.AddScript({
param($Token, $AccountId, $i, $j)
Disable-AzContextAutosave -Scope Process | Out-Null
$null = Connect-AzAccount -AccessToken $Token -AccountId $AccountId -Scope Process -ErrorAction SilentlyContinue
if($null -eq $Token)
{
$msg = "Azure Token is '`$null'"
$Results["$j-$i"] = $msg
throw $msg
}
$rg = $null
try {
$rg = (Get-AzResourceGroup -ErrorAction Stop).Count
}
catch {
$Results["$j-$i"] = "$_"
}
if ($null -ne $rg) {
# spammy!
$Results["$j-$i"] = "Found $rg resource groups"
}
}).AddParameter('Token', $Token).AddParameter('AccountId', $AccountId).AddParameter('i', $i).AddParameter('j', $j) > $null
$runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
}
# Write-Host "Inner: Waiting for runspaces to complete"
while($runspaces.Status.IsCompleted -contains $false){Start-Sleep -Milliseconds 10}
}
# controls how many jobs to run
$Runs = 10
$RunsNested = 5
Write-Host "Connecting and fetching resource groups in Azure $($Runs*$RunsNested) times"
Write-Host "Setting up runspaces"
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$Results = [HashTable]::Synchronized(@{})
$sessionstate.Variables.Add(
(New-Object System.Management.Automation.Runspaces.SessionStateVariableEntry('Results', $Results, $null))
)
$runspacepool = [runspacefactory]::CreateRunspacePool(1, [int]$env:NUMBER_OF_PROCESSORS+1, $sessionstate, $Host)
$runspacepool.Open()
$runspaces = @()
Write-Host "Invoking $Runs runspaces"
for ($i = 0; $i -lt $Runs; $i++) {
$runspace = [powershell]::Create()
$runspace.RunspacePool = $runspacepool
$runspace.AddScript($scriptBlock).AddParameter('Token', $Token).AddParameter('AccountId', $AccountId).AddParameter('j', $i).AddParameter('Runs', $RunsNested) > $null
$runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
}
Write-Host "Waiting for runspaces to complete"
while($runspaces.Status.IsCompleted -contains $false){Start-Sleep -Milliseconds 10}
foreach ($Result in $Results.GetEnumerator()) {
Write-Output $Result
}
The process cannot access the file TokenCache.dat because it is being used by another process
Enable-AzureRmContextAutosave -Scope CurrentUser
to work around that, I get Your Azure credentials have not been set up or have expired, please run Connect-AzAccount to set up your Azure credentials.
Enable-AzureRmContextAutosave
is in effect doesn't help, once a job was started with a "bad" context, they all fail.So my solution is importing the profile under a lock:
$mutex = [System.Threading.Mutex]::new($false, "foo")
$mutex.WaitOne()
try
{
Import-AzContext -Path $ProfilePath
}
finally
{
$mutex.ReleaseMutex()
}
EDIT - looks like retries are still necessary, I guess Import-AzContext
keeps the file open. Maybe Clear-AzContext
, I'll try.
Hello,
FWIW i wanted to mention that we are too encountering a similar issue while running our ps code in Azure Functions. Similar to @bingbing8 our use case is to login to several different tenants/subscriptions for management and we are randomly experiencing this same issue.
@isra-fel, could you take a look this issue?
I'm on a Mac in powershell and every single time I switch context I have to reauthenticate. 100% of the time. It's very frustrating - the docs outline switching context being a seamless experience, but it doesn't seem to work as intended.
I had the same issue when running jobs in multiple threads via a "ForEach-Parallel" function.
This is how I at least solved it, including the function as well (not written by me).
Whenever the error messages appears, i retry connecting again. This has been working for me 100% of the times.
function ForEach-Parallel {
<#
.SYNOPSIS
A parallel ForEach that uses runspaces
.PARAMETER ScriptBlock
ScriptBlock to execute for each InputObject
.PARAMETER ScriptFile
Script file to execute for each InputObject
.PARAMETER InputObject
Object(s) to run script against in parallel
.PARAMETER Throttle
Maximum number of threads to run at one time. Default: 5
.PARAMETER Timeout
Stop each thread after this many minutes. Default: 0
WARNING: This parameter should be used as a failsafe only
Set it for roughly the entire duration you expect for all threads to complete
.PARAMETER SleepTimer
When looping through open threads, wait this many milliseconds before looping again. Default: 200
.EXAMPLE
(0..50) | ForEach-Parallel -Throttle 4 { $_; sleep (Get-Random -Minimum 0 -Maximum 5) }
}
Send the number 0 through 50 to scriptblock. For each, display the number and then sleep for 0 to 5 seconds. Only execute 4 threads at a time.
.EXAMPLE
$servers | Foreach-Parallel -Throttle 20 -Timeout 60 -sleeptimer 200 -verbose -scriptFile C:\query.ps1
Run query.ps1 against each computer in $servers. Run 20 threads at a time, timeout a thread if it takes longer than 60 minutes to run, give verbose output.
.FUNCTIONALITY
PowerShell Language
.NOTES
Credit to Tome Tanasovski
http://powertoe.wordpress.com/2012/05/03/foreach-parallel/
#>
[cmdletbinding()]
param(
[Parameter(Mandatory = $false, position = 0, ParameterSetName = 'ScriptBlock')]
[System.Management.Automation.ScriptBlock]$ScriptBlock,
[Parameter(Mandatory = $false, ParameterSetName = 'ScriptFile')]
[ValidateScript( { test-path $_ -pathtype leaf })]
$scriptFile,
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[PSObject]$InputObject,
[int]$Throttle = 5,
[double]$sleepTimer = 200,
[double]$Timeout = 0
)
BEGIN {
#Build the scriptblock depending on the parameter used
switch ($PSCmdlet.ParameterSetName) {
'ScriptBlock' { $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString()) }
'ScriptFile' { $scriptblock = [scriptblock]::Create($(get-content $scriptFile | out-string)) }
Default { Write-Error ("Must provide ScriptBlock or ScriptFile"); Return }
}
#Define the initial sessionstate, create the runspacepool
Write-Verbose "Creating runspace pool with $Throttle threads"
$sessionState = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$pool = [Runspacefactory]::CreateRunspacePool(1, $Throttle, $sessionState, $host)
$pool.open()
#array to hold details on each thread
$threads = @()
#If inputObject is bound get a total count and set bound to true
$bound = $false
if ( $PSBoundParameters.ContainsKey("inputObject") ) {
$bound = $true
$totalCount = $inputObject.count
}
}
PROCESS {
$run = @'
#For each pipeline object, create a new powershell instance, add to runspacepool
$powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject)
$powershell.runspacepool=$pool
$startTime = get-date
#add references to inputobject, instance, handle and startTime to threads array
$threads += New-Object psobject -Property @{
Object = $inputObject;
instance = $powershell;
handle = $powershell.begininvoke();
startTime = $startTime
}
Write-Verbose "Added $inputobject to the runspacepool at $startTime"
'@
#Run the here string. Put it in a foreach loop if it didn't come from the pipeline
if ($bound) {
$run = $run -replace 'inputObject', 'object'
foreach ($object in $inputObject) {
Invoke-Expression -command $run
}
}
else {
Invoke-Expression -command $run
}
}
END {
$notdone = $true
#Loop through threads.
while ($notdone) {
$notdone = $false
for ($i = 0; $i -lt $threads.count; $i++) {
$thread = $threads[$i]
if ($thread) {
#If thread is complete, dispose of it.
if ($thread.handle.iscompleted) {
Write-verbose "Closing thread for $($thread.Object)"
$thread.instance.endinvoke($thread.handle)
$thread.instance.dispose()
$threads[$i] = $null
}
#Thread exceeded maxruntime timeout threshold
elseif ( $Timeout -ne 0 -and ( (get-date) - $thread.startTime ).totalminutes -gt $Timeout ) {
Write-Error "Closing thread for $($thread.Object): Thread exceeded $Timeout minute limit" -TargetObject $thread.inputObject
$thread.instance.dispose()
$threads[$i] = $null
}
#Thread is running, loop again!
else {
$notdone = $true
}
}
}
#Sleep for specified time before looping again
Start-Sleep -Milliseconds $sleepTimer
}
$pool.close()
}
}
$resources = Get-AzResource -ResourceGroupName "XXX-rg" | Where { $_.ResourceType -eq 'Microsoft.Compute/virtualMachines' -or $_.ResourceType -eq 'Microsoft.Compute/virtualMachineScaleSets' -or $_.Type -eq 'Microsoft.Network/applicationGateways' }
if ($resources.length -gt 0) {
Write-Output "`nGetting resources:"
$resources | Foreach-Parallel -Throttle $resources.count -Timeout 600 -sleeptimer 200 {
Write-Output "Fetching: $($_.name)"
$count = 0
do {
try {
#Just test fetching RG's
$rg = (Get-AzResourceGroup -ErrorAction Stop).Count
Write-Output $rg
if ($_.type -eq "Microsoft.Compute/virtualMachines") {
$item = Get-AzVM -Name $_.name -ResourceGroupName $_.ResourceGroupName -ErrorAction Stop
Write-Output $item.Name
$success = $true
}
elseif ($_.type -eq "Microsoft.Compute/virtualMachineScaleSets") {
$item = Get-AzVmss -VMScaleSetName $_.name -ResourceGroupName $_.ResourceGroupName -ErrorAction Stop
Write-Output $item.Name
$success = $true
}
elseif ($_.type -eq "Microsoft.Network/applicationGateways") {
$item = Get-AzApplicationGateway -ResourceGroupName $appgw.ResourceGroupName -Name $appgw.Name -Verbose -ErrorAction Stop
Write-Output $item.Name
$success = $true
}
}
catch {
if ($_.Exception.Message -eq "Your Azure credentials have not been set up or have expired, please run Connect-AzAccount to set up your Azure credentials.") {
write-Output $_.Exception.Message
Write-Output "Will try to run Connect-AzAccount again"
Connect-AzAccount -ServicePrincipal -Credential $pscred -TenantId $tenantid
Select-AzSubscription -SubscriptionName $subscriptionName
}
else {
write-Output "Unknown Error:"
write-Output $_.Exception.Message
}
}
$count++
}until($count -eq 10 -or $success)
if (-not($success)) { exit }
}
}
Also getting random occurrences of this error. Very frustrating.
Same thing here since moving from PowerShell 5 and AzureRM to PowerShell Core with Az module. Everything is running on Windows.
Just converted a customers AzureRM based Workflow Runbook to Az, and am encountering the same issue.
Same issue here. Randomly encounter the expired Azure credential issue in background jobs started with Start-Job
. Not necessarily the first one that finishes.
Had the same issue here with a "ForEach-Object -Parallel" loop. However, interestingly enough if I add a "Start-Sleep -Seconds (Get-Random -Maximum 10)" to my loop I'm not getting this anymore so wondering if its not something causing some form of block either locally or on the Azure APIs.
At least for the Foreach-Object -Parallel, this is likely the issue: #11201
The foreach-object -parallel
issue was fixed in #12041 and the fix will be in our next release, but the original issue seemed to have a different root cause. Still digging into it.
@arpparker , could you check if the latest Az can reproduce the problem?
@dingmeng-xue - I just upgraded to the latest version of Azure Powershell with this command (admin window): Install-Module -Name Az -AllowClobber -Force
And retried my repro from above and still see the same issue.
Here's the code:
$scriptBlock = {
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock {
$rg = $(Get-AzResourceGroup).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
@arpparker , could you check if the latest Az can reproduce the problem?
I'll give my reproduction steps a go tomorrow morning. But I can confirm that reproduction steps from @petehauge do indeed still fail for me. If I recall correctly, we aren't passing the Azure context into the script block in quite the same way though, so I'll definitely see if I can reproduce using my original steps. Stay tuned.
EDIT: Also, just to confirm, the latest version of the Az module should be v4.3.0, correct? That's what I'm seeing after running the Update-Module command for Az.
@arpparker , could you check if the latest Az can reproduce the problem?
@dingmeng-xue, unfortunately the issue is not resolved. I was able to replicate the problem again using both my reproduction steps above and in the original script where I first discovered the issue.
As mentioned above, the version of Az installed is v4.3.
I believe this might be related to locks. Some of the jobs cannot get access to the token cache file, so they fall back to in-memory mode, which of course is empty and contains no access tokens, hence the error.
And the reason why some jobs cannot get access is, we use a lock
to protect the cache file, but it only works on the thread level, while PowerShell jobs are process-based.
I'm trying to figure out a solution. Will keep updating.
Yes, I agree that it's probably related to locks. I was able to develop a workaround that seems to always work using a mutex before making any calls to Azure in each thread - the code is below. This tells me that as long as no jobs access the token cache at the same time they don't fail...
$scriptBlock = {
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock {
# WORKAROUND: https://github.com/Azure/azure-powershell/issues/9448
$Mutex = New-Object -TypeName System.Threading.Mutex -ArgumentList $false, "Global\AzDtlLibrary"
$Mutex.WaitOne() | Out-Null
Enable-AzContextAutosave -Scope Process | Out-Null
$rg = Get-AzResourceGroup | Out-Null
$Mutex.ReleaseMutex() | Out-Null
$rg = $(Get-AzResourceGroup).Count
if (-not $rg) {
Write-Error "Hit an issue..."
}
else {
Write-Output "No problem..."
}
}
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
}
$jobs = @()
for ($i = 0; $i -lt 10; $i++) {
$jobs += Start-Job -ScriptBlock $scriptBlock
}
if($jobs.Count -ne 0)
{
Write-Output "Waiting for $($jobs.Count) test runner jobs to complete"
foreach ($job in $jobs){
$result = Receive-Job $job -Wait
Write-Output $result
}
Remove-Job -Job $jobs
}
I believe this might be related to locks. Some of the jobs cannot get access to the token cache file, so they fall back to in-memory mode, which of course is empty and contains no access tokens, hence the error. And the reason why some jobs cannot get access is, we use a
lock
to protect the cache file, but it only works on the thread level, while PowerShell jobs are process-based.I'm trying to figure out a solution. Will keep updating.
@isra-fel Thanks for the update! Looking forward to what you find.
Yes, I agree that it's probably related to locks. I was able to develop a workaround that seems to always work using a mutex before making any calls to Azure in each thread - the code is below. This tells me that as long as no jobs access the token cache at the same time they don't fail...
$scriptBlock = { $jobs = @() for ($i = 0; $i -lt 10; $i++) { $jobs += Start-Job -ScriptBlock { # WORKAROUND: https://github.com/Azure/azure-powershell/issues/9448 $Mutex = New-Object -TypeName System.Threading.Mutex -ArgumentList $false, "Global\AzDtlLibrary" $Mutex.WaitOne() | Out-Null Enable-AzContextAutosave -Scope Process | Out-Null $rg = Get-AzResourceGroup | Out-Null $Mutex.ReleaseMutex() | Out-Null $rg = $(Get-AzResourceGroup).Count if (-not $rg) { Write-Error "Hit an issue..." } else { Write-Output "No problem..." } } } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs } } $jobs = @() for ($i = 0; $i -lt 10; $i++) { $jobs += Start-Job -ScriptBlock $scriptBlock } if($jobs.Count -ne 0) { Write-Output "Waiting for $($jobs.Count) test runner jobs to complete" foreach ($job in $jobs){ $result = Receive-Job $job -Wait Write-Output $result } Remove-Job -Job $jobs }
@petehauge This sounds very promising, but this might extend beyond my level of expertise. Can you explain what exactly this is doing, I'm not sure I'm following. How would I (if possible) incorporate into my initial reproduction script in the original post considering I'm passing the Azure context as a parameter to the script block?
@arpparker - sure! Basically, the code is insuring that the first line of code that needs to get a context is doing so only one at a time via using a single mutex. IE: you could incorporate this into your code this way:
$context = Connect-AzAccount -Subscription "My Subscription" -Tenant xxxx-xxxxxx-xxxxxx-yyyyy
$job = Start-Job -ArgumentList $context -ScriptBlock {
param($AzContext)
# WORKAROUND: https://github.com/Azure/azure-powershell/issues/9448
$Mutex = New-Object -TypeName System.Threading.Mutex -ArgumentList $false, "Global\MyCustomMutex"
$Mutex.WaitOne() | Out-Null
# Put only the first line of AZ powershell code here
# this ensures that the first time on the thread we check tokens only one at a time
$vms = Get-AzVm -AzContext $AzContext
$Mutex.ReleaseMutex() | Out-Null
# Additional code goes here
}
NOTE: I didn't test the code, but this is about what you would need... The first Az command in a script block needs to be guarded by a mutex so it executes only one at a time across all the threads.
Hi all, We've released Az.Accounts 1.9.1 fixing this issue. Could you update and try it? Thank you 😀
This is such a relief! I ran through my repro above and it works, the issue looks like it's fixed! I really appreciate getting this fixed, I'm going to go throw away all my workaround code... :-) Thanks!
Allright I'll close the issue. Thanks to everyone providing information to help resolve the issue. Really appreciate that!
I can also confirm this appears to be fixed for me as well.
@petehauge thanks for the explanation of the workaround a few weeks ago, I never even got a chance to try it. This is even better! :)
Thanks everyone!
This issue re-appeared since yesterday (23rd of July 2020). Upgrading Az.Accounts to 1.9.1 doesn't help
@PavelPikat , many reason may lead to this error. Could you raise a new issue and we will continue following up? If you can clarify further about your steps, it will be great.
It seems to be Azure DevOps pipeline specific, so I raised https://github.com/microsoft/azure-pipelines-tasks/issues/13348 with them
When im deploying the resource using ARM i encountered the similar issue. and has been resolved using command Clear-AzContext -Force
Description
This issue is very similar to several previous issues here, here, and here. When passing the current Azure context to the Start-Job command, the first job that completes will often fail with the error message, "Your Azure credentials have not been set up or have expired, please run Connect-AzAccount to set up your Azure credentials". Subsequent commands complete successfully.
Steps to reproduce
Finding a way to consistently reproduce this has nearly drove me mad. I fully realize the steps below may seem oddly specific, but what I've outlined is the only way I've been able to reliably and consistently reproduce the issue. There may very well be a better way to reproduce it (or a way with fewer steps), but this method will work for me every time.
Environment data
Module versions
Debug output
NOTE Any potentially private information has been blanked out with 'xxx'. If any thing that was blanked out is needed, please contact me privately.
Error output