Open KirkMunro opened 6 years ago
One more detail that may be relevant: the script in question is invoked in the context of a service principal that has full rights over the subscription. Just mentioning that in case it is relevant to the issue.
@KirkMunro Hey Kirk, thanks for filing this issue. This delay is something that we've seen ourselves and have had to adapt to in certain circumstances (e.g., assigning a role to a newly created service principal).
@darshanhs90 Hey Haridarshan, do you know what could be causing this delay on the RBAC side? It seems it takes 15-30 seconds for a new AD object to be "found" client side after creation.
@RBACAsk for FYI
You are most likely hitting AAD replication lag issues. We have added a backlog item on PowerShell side to improve the experience where you create a service principal and then assign a role to it.
@surabhi-pandey: Thank you for creating the backlog item. In @cormacpayne's case, it was with a service principal. In my case, it was with an AD user. I just wanted to draw attention to the fact that this is not just about service principals.
Also, is there a way to work around this replication lag? If I had the actual object ID instead of the UPN, would that be guaranteed to work immediately after creation? Or would there still be potential replication lag issues in that scenario as well?
Hi Kirk, The ObjectID won’t help in this case. The original create call to AAD Graph returns some response headers that can help with this. We can look into exposing them in the create user/servicePrincipal cmdlets and add a parameter for these in the role assignment create cmdlet. Thx Surabhi
From: Kirk Munro notifications@github.com Sent: Friday, June 29, 2018 1:46 PM To: Azure/azure-powershell azure-powershell@noreply.github.com Cc: RBAC Ask RBACAsk@microsoft.com; Mention mention@noreply.github.com Subject: Re: [Azure/azure-powershell] First lookup of new AD user fails? (#6493)
@surabhi-pandeyhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsurabhi-pandey&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7Caa132717e9d9473a376f08d5de0161dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636659019883059026&sdata=Iz50PjSyLVA8PZLVhUN%2BjMloAppVtQJyRr58B%2F5D584%3D&reserved=0: Thank you for creating the backlog item. In @cormacpaynehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcormacpayne&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7Caa132717e9d9473a376f08d5de0161dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636659019883059026&sdata=LfpmM12ebyPlXTM1%2FIO3Dc1pH3nkfYVfegiZuRurT6Y%3D&reserved=0's case, it was with a service principal. In my case, it was with an AD user. I just wanted to draw attention to the fact that this is not just about service principals.
Also, is there a way to work around this replication lag? If I had the actual object ID instead of the UPN, would that be guaranteed to work immediately after creation? Or would there still be potential replication lag issues in that scenario as well?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-powershell%2Fissues%2F6493%23issuecomment-401469314&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7Caa132717e9d9473a376f08d5de0161dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636659019883059026&sdata=CgM2baSBVKvXylS83hYBjx6SLVwpXIZ%2F1cSrECx1Q%2Bs%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASmF_aoz3wDsmUGMKewoI_H46WJ6Ss4Zks5uBpIhgaJpZM4UvTGH&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7Caa132717e9d9473a376f08d5de0161dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636659019883059026&sdata=epOsAu%2BOPwJME8gjS57zQk3aX8W8QLZ6HFElovWXahs%3D&reserved=0.
@RBACAsk: That sounds like a very specific solution to a general problem. What if I was doing something else with the user account? What if I was working with an AD application, AD group, or AD group member? I suspect the majority of these could be affected by replication delays. And with those, in this case I'm assigning a user role, but what else might I be doing that requires the replication to be complete before the command runs? Maybe I'm setting an Azure key vault access policy. Or something else.
Plus, in my case, I don't invoke the code to create the user. Internal/backend logic in a microservice does that. I just invoke some PowerShell to do things with the user (or service principal, or maybe group) that gets created. With that in place, I don't know that returning something in the response header would help (although if it's some value I could find a way to capture that in the microservice and then use it in place of object id when assigning a new role).
Is that what you have in mind? A value that can be passed in to another cmdlet in place of objectid, spn, etc. that will cause whatever cmdlet I'm using that works with the AAD object I'm referencing to succeed even if AAD replication hasn't completed yet?
As long as this is resolved in a general manner, and not just specific to this issue, that's what I'm looking for here.
Generally speaking, I think if someone is writing PowerShell scripts using an object id, then it should be assumed that the object exists, and the wait loop logic (with timeout) should simply happen automatically, either internally within Azure PowerShell, or ideally, behind the scenes in Azure itself. Unless I'm mistaken, nobody is going to have manually typed object ids in their code, right? AFAIK, there is no such thing as a well-known object id in Azure (and if there was, those would exist, so it doesn't matter). Is the creation of yet another property/parameter necessary for this, or can one assume when using object id that the call should have wait/timeout logic in that case?
A few other ideas that came to mind when I ran into this was a generally applied -Wait parameter for commands that may be run against AD objects that have not replicated yet, or a set of Wait-AzureRmAD* cmdlets that will wait for an object to exist, or return an error if a timeout occurs. By far I prefer the other approaches suggested though -- object id is a pretty solid way of identifying that the object does exist.
Hi Kirk, AAD Graph APIs already return some information in the response headers that can be used in subsequent calls to indicate that the entity was recently created and help route the request to nodes where the entity has been replicated. Subsequent calls to AAD Graph are still made with the objectId, but you also pass this info in the request headers .
We will look into how to make the Azure RBAC cmdlets support these.
Thx Surabhi
From: Kirk Munro notifications@github.com Sent: Wednesday, July 4, 2018 7:19 AM To: Azure/azure-powershell azure-powershell@noreply.github.com Cc: RBAC Ask RBACAsk@microsoft.com; Mention mention@noreply.github.com Subject: Re: [Azure/azure-powershell] First lookup of new AD user fails? (#6493)
@RBACAskhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FRBACAsk&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7C7be9f8427b5249cc856f08d5e1b90eeb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636663107299088233&sdata=MnUA9PNR8sK6%2B3PA1S0AKr2Ii%2FdFUjBNetvb3HiXiaw%3D&reserved=0: That sounds like a very specific solution to a general problem. What if I was doing something else with the user account? What if I was working with an AD application, AD group, or AD group member? I suspect the majority of these could be affected by replication delays. And with those, in this case I'm assigning a user role, but what else might I be doing that requires the replication to be complete before the command runs? Maybe I'm setting an Azure key vault access policy. Or something else.
Plus, in my case, I don't invoke the code to create the user. Internal/backend logic in a microservice does that. I just invoke some PowerShell to do things with the user (or service principal, or maybe group) that gets created. With that in place, I don't know that returning something in the response header would help (although if it's some value I could find a way to capture that in the microservice and then use it in place of object id when assigning a new role).
Is that what you have in mind? A value that can be passed in to another cmdlet in place of objectid, spn, etc. that will cause whatever cmdlet I'm using that works with the AAD object I'm referencing to succeed even if AAD replication hasn't completed yet?
As long as this is resolved in a general manner, and not just specific to this issue, that's what I'm looking for here.
Generally speaking, I think if someone is writing PowerShell scripts using an object id, then it should be assumed that the object exists, and the wait loop logic (with timeout) should simply happen automatically, either internally within Azure PowerShell, or ideally, behind the scenes in Azure itself. Unless I'm mistaken, nobody is going to have manually typed object ids in their code, right? AFAIK, there is no such thing as a well-known object id in Azure (and if there was, those would exist, so it doesn't matter). Is the creation of yet another property/parameter necessary for this, or can one assume when using object id that the call should have wait/timeout logic in that case?
A few other ideas that came to mind when I ran into this was a generally applied -Wait parameter for commands that may be run against AD objects that have not replicated yet, or a set of Wait-AzureRmAD* cmdlets that will wait for an object to exist, or return an error if a timeout occurs. By far I prefer the other approaches suggested though -- object id is a pretty solid way of identifying that the object does exist.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-powershell%2Fissues%2F6493%23issuecomment-402491752&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7C7be9f8427b5249cc856f08d5e1b90eeb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636663107299088233&sdata=1eHD4CzDFbgqKinKC11%2B0m%2F%2BXZz6iqjbsCaMePihjsI%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FASmF_XjqNzZ7DM0EYbTgftMuLPsSKDVHks5uDM7GgaJpZM4UvTGH&data=02%7C01%7CSurabhi.Pandey%40microsoft.com%7C7be9f8427b5249cc856f08d5e1b90eeb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636663107299088233&sdata=PggErJHwJekBrGbS8CTFxMJls9qCxdSxjNFTGbeyKjk%3D&reserved=0.
I was unable to repro this delay. It looks like it should no longer be a problem due to a change in the PAS service. The update should be available with the new Az powershell cmdlet release next week.
Have you continued to see this issue? If not, I will be closing it on Friday.
I'm no longer able to verify this using the logic where it was originally an issue. @James-Burnham may be able to confirm whether or not this is still an issue because he also ran into it and was using my workaround.
I just ran into this today, if I create an AAD group, and then immediately try to assign an eligible role to it, I get "subject not found". If I wait a bit, then it works. I don't know how to verify when the group is available to the command other than to constantly retry until it works. Get-AzAdGroup works immediately.
VERBOSE: Performing the operation "Grant eligible role Contributor to Azure AD Group ResourceGroup - np-sftpStorageAccount-dev" on target "IdNow-PIM-ResourceGroup-np-sftpStorageAccount-dev".
Error creating eligible role Contributor for ResourceGroup - np-sftpStorageAccount-dev: The subject is not found.
Raw JSON from the API:
PUT https://management.azure.com/subscriptions/xxx/resourceGroups/np-sftpStorageAccount-dev/providers/Microsoft.Authorization/roleEligibilityScheduleRequests/xxx?api-version=2020-10-01-preview HTTP/1.1
{
"properties": {
"scheduleInfo": {
"expiration": {
"duration": "P3650D"
},
"startDateTime": "2022-12-23T18:42:46.8659717Z"
},
"roleDefinitionId": "/subscriptions/xxx/resourceGroups/np-sftpStorageAccount-dev/providers/Microsoft.Authorization/roleDefinitions/xxx",
"principalId": "xxx",
"requestType": "AdminAssign"
}
}
{"error":{"code":"SubjectNotFound","message":"The subject is not found."}}
I am seeing an issue that looks like it is the same issue.
I have a Powershell script that create a group in Azure AD via New-MgGroup
and then immediately attempts to create a SQL database user for that group via SQL-Invoke
with the query "CREATE USER [TheNewGroup] FROM EXTERNAL PROVIDER". The script will reliably fail with an error "Principal 'TheNewGroup' could not be found or this principal type is not supported." This failure happens (a) even if I insert a call to Get-MgGroup
before the call to Sql-Invoke
to confirm the new group exists (this command reports it does); and (b) even if I use the ObjectId from that call to Get-MgGroup
.
If I rerun the command after a few seconds it will succeed. I haven't been able to figure out a solution other than simply retrying until it succeeds.
Simple script to reproduce. This assumes you are connected to azure as a user with the appropriate rights to the sql database:
$ResourceGroupName = 'my-resource-group'
$SqlServerName = 'my-sql-server'
$DatabaseName = 'myDatabase'
$AzureSqlUrl = "https://database.windows.net"
$SqlServerUrl = $(Get-AzSqlServer -ResourceGroupName $ResourceGroupName -ServerName $SQLServerName).FullyQualifiedDomainName
$SqlDbAccess = @{
ServerInstance = $SqlServerUrl
Database = $DatabaseName
AccessToken = $(Get-AzAccessToken -ResourceUrl $AzureSqlUrl).Token
}
$DbGroupParams = @{
DisplayName = 'MyNewGroup'
MailEnabled = $false
MailNickname = 'MyNewGroup'
SecurityEnabled = $true
}
New-MgGroup @DbGroupParams
# this correctly returns the new group
Get-MgGroup -Filter "DisplayName eq 'MyNewGroup'"
# this fails the first time it is run though will eventually succeed if rerun successively
Invoke-SqlCmd @SqlDbAccess -Query "CREATE USER [MyNewGroup] FROM EXTERNAL PROVIDER;"
Also, since this issue is quite old and may not have a quick resolution, here is a workaround retry script for anyone finding this. Obviously substitute whatever command you are using to try to access the directory for the Invoke-SqlCmd
as appropriate--but do make sure to add ErrorAction Stop
to be able to catch the error.
I have found that it can take up to 15 seconds for the directory change to be visible to SQL. YMMV.
$Retries = 20
for ($i = 0; $i -lt $Retries; $i++)
{
try
{
$Percentage = $i / $Retries * 100
Write-Progress -Activity "Adding..." -Status "$Percentage% Complete:" -PercentComplete $Percentage
Invoke-Sqlcmd @SqlDbAccess -Query "CREATE USER [$($DatabaseAccessGroupName)] FROM EXTERNAL PROVIDER;" -ErrorAction Stop
# on success stop retrying
break;
}
catch
{
if ($i -ge ($Retries - 1))
{
# all retries have failed
Write-Error "Failed to add $DatabaseAccessRole as database user"
}
else
{
# wait and try again
Start-Sleep -Milliseconds 1000
}
}
}
Description
I have software that dynamically creates new users and assigns them rights to a new resource group that is also dynamically created. Once that is done, an event is fired, and I have a script that attempts to use New-AzureRmRoleAssignment to grant that new user permissions to a pre-existing resource group. The command being fired is a simple one-liner:
When this fires, the invocation fails every time with an error indicating "The provided information does not map to an AD object ID", meaning that it could not find the user from the UPN. If, however, I modify the script to try to look up the user repeatedly with delay, and then once the user is found create the new role assignment, it works fine. In both cases, however, I know that the user exists because I can sign in with them immediately in the Azure portal. Here is a revised version of the same script that works after a single retry:
This second version is much more complicated and shouldn't be necessary because the user already exists (100% guaranteed because we assign the user rights to the resource group we dynamically create before my script is invoked, which wouldn't be possible if the user did not exist). Also note that the second version works the second time it is invoked (a 250ms delay is all it takes).
All of this to say, there appears to be an issue when looking up new AD users the first time using Azure PowerShell.
Script/Steps for Reproduction
See above.
Module Version
6.2.1
Environment Data
Debug Output