Azure / ARO-RP

Azure Red Hat OpenShift RP
https://azure.microsoft.com/products/openshift/
Apache License 2.0
101 stars 170 forks source link

e2e fails because of msft network rule changes? #558

Open jim-minter opened 4 years ago

jim-minter commented 4 years ago

2020-04-22T14:14:58.8996065Z time="2020-04-22T14:14:58Z" level=info msg="Code=\"DeploymentFailed\" Message=\"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.\" Details=[{\"code\":\"Forbidden\",\"message\":\"{\r\n \\"error\\": {\r\n \\"code\\": \\"AuthorizationFailed\\",\r\n \\"message\\": \\"The client '68d44b5e-6596-49ae-8419-ad567332ac6d' with object id '68d44b5e-6596-49ae-8419-ad567332ac6d' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/aro-v4-e2e-rg-954a6504-eastus/providers/Microsoft.Compute/virtualMachines/v4-e2e-954a6504-x24k5-master-1' or the scope is invalid. If access was recently granted, please refresh your credentials.\\"\r\n }\r\n}\"},{\"code\":\"Forbidden\",\"message\":\"{\r\n \\"error\\": {\r\n \\"code\\": \\"AuthorizationFailed\\",\r\n \\"message\\": \\"The client '68d44b5e-6596-49ae-8419-ad567332ac6d' with object id '68d44b5e-6596-49ae-8419-ad567332ac6d' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/aro-v4-e2e-rg-954a6504-eastus/providers/Microsoft.Compute/virtualMachines/v4-e2e-954a6504-x24k5-bootstrap' or the scope is invalid. If access was recently granted, please refresh your credentials.\\"\r\n }\r\n}\"},{\"code\":\"Forbidden\",\"message\":\"{\r\n \\"error\\": {\r\n \\"code\\": \\"AuthorizationFailed\\",\r\n \\"message\\": \\"The client '68d44b5e-6596-49ae-8419-ad567332ac6d' with object id '68d44b5e-6596-49ae-8419-ad567332ac6d' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/aro-v4-e2e-rg-954a6504-eastus/providers/Microsoft.Compute/virtualMachines/v4-e2e-954a6504-x24k5-master-0' or the scope is invalid. If access was recently granted, please refresh your credentials.\\"\r\n }\r\n}\"},{\"code\":\"Forbidden\",\"message\":\"{\r\n \\"error\\": {\r\n \\"code\\": \\"AuthorizationFailed\\",\r\n \\"message\\": \\"The client '68d44b5e-6596-49ae-8419-ad567332ac6d' with object id '68d44b5e-6596-49ae-8419-ad567332ac6d' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/aro-v4-e2e-rg-954a6504-eastus/providers/Microsoft.Compute/virtualMachines/v4-e2e-954a6504-x24k5-master-2' or the scope is invalid. If access was recently granted, please refresh your credentials.\\"\r\n }\r\n}\"},{\"code\":\"BadRequest\",\"message\":\"{\\r\\n \\\"error\\\": {\\r\\n \\\"code\\\": \\\"PrivateLinkServiceCannotBeCreatedInSubnetThatHasNetworkPoliciesEnabled\\\",\\r\\n \\\"message\\\": \\\"Private link service /subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourceGroups/aro-v4-e2e-rg-954a6504-eastus/providers/Microsoft.Network/privateLinkServices/v4-e2e-954a6504-x24k5-pls cannot be created in a subnet /subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourceGroups/v4-e2e-rg-954a6504-eastus/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/v4-e2e-954a6504-master since it has private link service network policies enabled.\\\",\\r\\n \\\"details\\\": []\\r\\n }\\r\\n}\"}]" func="install.(*Installer).deployARMTemplate.func1()" file="pkg/install/install.go:405" client_principal_name= client_request_id=91f0e73f-84a0-11ea-93d7-000d3ac3fc7d component=backend correlation_id= request_id=fa81bc7e-fa7e-47b4-a726-86095fc0c509 resource_group=v4-e2e-rg-954a6504-eastus resource_id=/subscriptions/46626fc5-476d-41ad-8c76-2ec49c6994eb/resourcegroups/v4-e2e-rg-954a6504-eastus/providers/microsoft.redhatopenshift/openshiftclusters/v4-e2e-954a6504 resource_name=v4-e2e-954a6504 subscription_id=46626fc5-476d-41ad-8c76-2ec49c6994eb

jim-minter commented 4 years ago

PrivateLinkServiceCannotBeCreatedInSubnetThatHasNetworkPoliciesEnabled

jim-minter commented 4 years ago

is this a missing validation, or a race, or both?

JackQuincy commented 4 years ago

This is probably a race condition between azure policy in msft tenant and cluster creation

Sent from Outlookhttp://aka.ms/weboutlook


From: Jim Minter notifications@github.com Sent: Wednesday, April 22, 2020 8:39 AM To: Azure/ARO-RP ARO-RP@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [Azure/ARO-RP] e2e fails because of msft network rule changes? (#558)

is this a missing validation, or a race, or both?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FARO-RP%2Fissues%2F558%23issuecomment-617856250&data=02%7C01%7Cjaquincy%40microsoft.com%7C10a86bc7811d4ff0c3e808d7e6d348f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637231667444844034&sdata=ZVh4hMyXCBKXnbAz%2Fg4OlJ7EBGwJM4NLeM1cWdmpswI%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFAGENAFMIXMTYZ5B4QQTHLRN4FRNANCNFSM4MOHE5KA&data=02%7C01%7Cjaquincy%40microsoft.com%7C10a86bc7811d4ff0c3e808d7e6d348f2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637231667444854034&sdata=I9UHRcyOM2tBoq4qN6V80HV6YhYOgBC4AmGItSsQnMI%3D&reserved=0.

jim-minter commented 4 years ago

sorry, scrappy comment from me. Agreed there's almost certainly a race; is there a validation that we could add nonetheless is the question.

mjudeikis commented 4 years ago

I think this is NoFix. This error comes from deployment (not PLC creation). And we see multiple errors here. Credentials propagation and PLS errors.

At best we can do this: https://github.com/Azure/ARO-RP/compare/master...mjudeikis:pls.race?expand=1 and propagate error to the customer. But this does not solve the real problem as the order is very tight:

***VALIDATIOM***     
action(i.createDNS),
action(func(ctx context.Context) error {
    return i.deployStorageTemplate(ctx, installConfig, platformCreds, image)
}),
action(i.createBillingRecord),
action(i.deployResourceTemplate),
action(i.createPrivateEndpoint),

Modification happens while DNS is being created and storage being deployed. I seen a case where credentials propagation for DNS took ~8 minutes. So at the point of deployResourceTemplate we already lost the race. We could add validation again, but will we do this for any possible componet and validate everything on each phase?

It looks like a very unreasonable fix...