Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 306 forks source link

az aks create fails to create loadbalancer #167

Closed hibri closed 6 years ago

hibri commented 6 years ago

I'm creating an aks cluster with the cli. Have tried this with CLI versions 2.0.25 and 2.0.26, in westeurope az aks create --resource-group --name --node-count 4 --generate-ssh-keys

When I run kubectl describe svc, I see this

› kubectl describe svc
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.0.0.1
Port:              https  443/TCP
TargetPort:        443/TCP
Endpoints:         <redacted>
Session Affinity:  ClientIP
Events:
  Type     Reason                      Age              From                Message
  ----     ------                      ----             ----                -------
  Warning  CreatingLoadBalancerFailed  1m (x7 over 6m)  service-controller  Error creating load balancer (will retry): Error getting LB for service default/kubernetes: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/<redacted>/resourceGroups/MC_tmp-<redacted>_westeurope/providers/Microsoft.Network/loadBalancers/kubernetes?api-version=2017-03-01: StatusCode=0 -- Original Error: adal: Refresh request failed. Status Code = '401'
devonbarrett commented 6 years ago

Also experiencing this while creating LBs and Azure disk based PVs:

****    1m         16m         62        ******                            PersistentVolumeClaim                             Warning   ProvisioningFailed                 persistentvolume-controller         Failed to provision volume with StorageClass "managed-premium": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/********/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-********?api-version=2016-04-30-preview: StatusCode=0 -- Original Error: adal: Refresh request failed. Status Code = '401'
****    1m         16m         9         ******                            Service                                           Warning   CreatingLoadBalancerFailed         service-controller                  Error creating load balancer (will retry): Error getting LB for service *****: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/********_westeurope/providers/Microsoft.Network/loadBalancers/kubernetes?api-version=2017-03-01: StatusCode=0 -- Original Error: adal: Refresh request failed. Status Code = '401'
hanzenok commented 6 years ago

Experiencing the same issue with K8s 1.8.7 and CLI 2.0.23 on westeurope

hibri commented 6 years ago

My issue was fixed by using a previously created service principal. Created with

az ad sp create-for-rbac
NiPfi commented 6 years ago

I had the same issue and was able to resolve it using a service principal like @hibri mentioned. To elaborate on his comment: I created a new AKS cluster using the appId returned by az ad sp create-for-rbac as the service-principal and the password as the client-secret.

hanzenok commented 6 years ago

The solution of using a new service principal worked for me too. Wich is weird, because my previous service principal worked fine before. And now is not.

Also, with the previous service principal the k8s was suddenly unable to pull image from ACR (ErrImagePull and ImagePullBackOff errors on pod). With the new one everything works fine.

slack commented 6 years ago

Interesting. When not given an explicit SP, az aks create should create a service principal for your clusters and stash it here: ~/.azure/aksServicePrincipal.json

We then add a roleAssignment for the node resource group. If you have the original SP floating around, I'd like to see the output of:

az aks show -g <resourceGroup> -n <clusterName> -o json | 'jq .servicePrincipalProfile.clientId'
az role assignment list --all --assignee <servicePrincipalProfile.clientId>
hanzenok commented 6 years ago

I've recreated a new cluser with original SP credentials passed via --service-principal and --client-secret parameters. When I do az role assignment list --all the cluster is on the list

devonbarrett commented 6 years ago

@slack:

~> az aks show -g ***** -n ****** -o json | jq '.servicePrincipalProfile.clientId'
"28d283d1-84cf-4d38-a9a3-6887b559e8e2"
~> az role assignment list --all --assignee "28d283d1-84cf-4d38-a9a3-6887b559e8e2"
[
  {
    "id": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/providers/Microsoft.Authorization/roleAssignments/8209a0b2-e222-4585-9fc6-56a685e8464d",
    "name": "8209a0b2-e222-4585-9fc6-56a685e8464d",
    "properties": {
      "additionalProperties": {
        "createdBy": "630f845e-64d4-4b97-983f-77061c4e739b",
        "createdOn": "2017-01-26T10:31:56.8694364Z",
        "updatedBy": "630f845e-64d4-4b97-983f-77061c4e739b",
        "updatedOn": "2017-01-26T10:31:56.8694364Z"
      },
      "principalId": "5f0b96b0-a7ce-4da8-bc11-ffc7823e482b",
      "principalName": "http://a6b8ab.fs-dev.None.cloudapp.azure.com",
      "roleDefinitionId": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/providers/Microsoft.Authorization/roleDefinitions/8e3af657-a8ff-443c-a75c-2fe8c4bcb635",
      "roleDefinitionName": "Owner",
      "scope": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13"
    },
    "type": "Microsoft.Authorization/roleAssignments"
  },
  {
    "id": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/resourceGroups/*********/providers/Microsoft.Authorization/roleAssignments/5250c1c7-374f-4a51-a309-ba1009eb25bd",
    "name": "5250c1c7-374f-4a51-a309-ba1009eb25bd",
    "properties": {
      "additionalProperties": {
        "createdBy": "9ae093c1-947d-4ba5-9260-8e36a6263037",
        "createdOn": "2018-01-04T23:22:46.8477433Z",
        "updatedBy": "9ae093c1-947d-4ba5-9260-8e36a6263037",
        "updatedOn": "2018-01-04T23:22:46.8477433Z"
      },
      "principalId": "5f0b96b0-a7ce-4da8-bc11-ffc7823e482b",
      "principalName": "http://a6b8ab.fs-dev.None.cloudapp.azure.com",
      "roleDefinitionId": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c",
      "roleDefinitionName": "Contributor",
      "scope": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/resourceGroups/*********"
    },
    "resourceGroup": "**************",
    "type": "Microsoft.Authorization/roleAssignments"
  },
  {
    "id": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/resourceGroups/**********/providers/Microsoft.Authorization/roleAssignments/08224439-000b-4f46-aa9b-6cdb9fb6588e",
    "name": "08224439-000b-4f46-aa9b-6cdb9fb6588e",
    "properties": {
      "additionalProperties": {
        "createdBy": "9ae093c1-947d-4ba5-9260-8e36a6263037",
        "createdOn": "2017-12-04T11:29:57.0252360Z",
        "updatedBy": "9ae093c1-947d-4ba5-9260-8e36a6263037",
        "updatedOn": "2017-12-04T11:29:57.0252360Z"
      },
      "principalId": "5f0b96b0-a7ce-4da8-bc11-ffc7823e482b",
      "principalName": "http://a6b8ab.fs-dev.None.cloudapp.azure.com",
      "roleDefinitionId": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c",
      "roleDefinitionName": "Contributor",
      "scope": "/subscriptions/97574325-76e1-46ec-84ce-cbc547063f13/resourceGroups/**********"
    },
    "resourceGroup": "**********",
    "type": "Microsoft.Authorization/roleAssignments"
  }
]
slack commented 6 years ago

Thanks @devonbarrett to my eye, the SP looks right and appears to have appropriate scopes to the node resource groups. Even scope at the subscription.

Any chance that the client secret/password was changed after creation?

devonbarrett commented 6 years ago

@slack Nope we haven't changed them, this was a cluster that actually was working for a while before the warnings started to appear.

sundaxi commented 6 years ago

same issue here, kindly help to further investigate the issue

[ { "id": "/subscriptions/xxxxx-xxx/resourceGroups/xxxxx/providers/Microsoft.Authorization/roleAssignments/e88342cf-ba59-4fcd-b32e-b4f9461795f1", "name": "e88342cf-ba59-4fcd-b32e-b4f9461795f1", "properties": { "additionalProperties": { "createdBy": "8745ea67-9a89-4b26-bacf-137dce1466ea", "createdOn": "2018-02-19T19:32:51.0380730Z", "updatedBy": "8745ea67-9a89-4b26-bacf-137dce1466ea", "updatedOn": "2018-02-19T19:32:51.0380730Z" }, "principalId": "bxxxxxxxxxxxxxxxxxxxx", "principalName": "https://qcri.org/5a8629eb-7950-43f9-810c-4e3342925f7e", "roleDefinitionId": "/subscriptions/xxxxxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c", "roleDefinitionName": "Contributor", "scope": "/subscriptions/xxxxxxxxxxxxxxx/resourceGroups/xxxxxxxxxxxxxxx" }, "resourceGroup": "xxxxxxxxxxxxxxxxxxxxxxxxxx", "type": "Microsoft.Authorization/roleAssignments" }, { "id": "/subscriptions/xxxxxxxxxxxxxxxxxxxxxx/resourceGroups/xxxxxxxxxxxxxxxxxxxxxxx/providers/Microsoft.Authorization/roleAssignments/34ed75f1-b69e-452b-bde9-45539b0e21aa", "name": "34ed75f1-b69e-452b-bde9-45539b0e21aa", "properties": { "additionalProperties": { "createdBy": "8745ea67-9a89-4b26-bacf-137dce1466ea", "createdOn": "2018-02-21T09:44:43.4599281Z", "updatedBy": "8745ea67-9a89-4b26-bacf-137dce1466ea", "updatedOn": "2018-02-21T09:44:43.4599281Z" }, "principalId": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "principalName": "https://qcri.org/5a8629eb-7950-43f9-810c-4e3342925f7e", "roleDefinitionId": "/subscriptions/xxxxxxxxxxxxxxxxxxxxx/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c", "roleDefinitionName": "Contributor", "scope": "/subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/xxxxxxxxxxxxxxxxxxxxxxs" }, "resourceGroup": "xxxxxxxxxxxxxxxxxxxxxxxxx", "type": "Microsoft.Authorization/roleAssignments" } ]

slack commented 6 years ago

@devonbarrett Can you run az ad sp show --id 28d283d1-84cf-4d38-a9a3-6887b559e8e2, curious about the validity of the credential.

devonbarrett commented 6 years ago

@slack

{                                                                                                                                                                                                                                                                               
  "additionalProperties": {
    "accountEnabled": true,
    "addIns": [],
    "alternativeNames": [],
    "appDisplayName": "fs-dev",
    "appOwnerTenantId": "8efd2a0f-40da-4fba-a9a7-f046197d6b67",
    "appRoleAssignmentRequired": false,
    "appRoles": [],
    "deletionTimestamp": null,
    "errorUrl": null,
    "homepage": "http://a6b8ab.fs-dev.None.cloudapp.azure.com",
    "keyCredentials": [],
    "logoutUrl": null,
    "oauth2Permissions": [
      {
        "adminConsentDescription": "Allow the application to access fs-dev on behalf of the signed-in user.",
        "adminConsentDisplayName": "Access fs-dev",
        "id": "41669e87-fa90-4022-9548-507f9bfb51dd",
        "isEnabled": true,
        "type": "User",
        "userConsentDescription": "Allow the application to access fs-dev on your behalf.",
        "userConsentDisplayName": "Access fs-dev",
        "value": "user_impersonation"
      }
    ],
    "odata.metadata": "https://graph.windows.net/8efd2a0f-40da-4fba-a9a7-f046197d6b67/$metadata#directoryObjects/Microsoft.DirectoryServices.ServicePrincipal/@Element",
    "odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
    "passwordCredentials": [],
    "preferredTokenSigningKeyThumbprint": null,
    "publisherName": "Default Directory",
    "replyUrls": [],
    "samlMetadataUrl": null,
    "servicePrincipalType": "Application",
    "tags": [],
    "tokenEncryptionKeyId": null
  },
  "appId": "28d283d1-84cf-4d38-a9a3-6887b559e8e2",
  "displayName": "fs-dev",
  "objectId": "5f0b96b0-a7ce-4da8-bc11-ffc7823e482b",
  "objectType": "ServicePrincipal",
  "servicePrincipalNames": [
    "http://a6b8ab.fs-dev.None.cloudapp.azure.com",
    "28d283d1-84cf-4d38-a9a3-6887b559e8e2"
  ]
}
weinong commented 6 years ago

@devonbarrett can you check two things:

  1. run az ad app show --id 28d283d1-84cf-4d38-a9a3-6887b559e8e2, it will return something like below to verify whether the credential has expired or not.
    "passwordCredentials": [
      {
        "customKeyIdentifier": null,
        "endDate": "2018-03-20T21:05:51.05505Z",
        "keyId": "e08c4ce1-6999-4035-9e13-a2ea7b5f6145",
        "startDate": "2017-03-20T21:05:51.05505Z",
        "value": null
      }
    ],
  2. validate the password by az login --service-principal -u $appID -p $PWD
devonbarrett commented 6 years ago

Unsure what to use as the password as this was created by AKS. Here's the output of the first @weinong: az ad app show --id 28d283d1-84cf-4d38-a9a3-6887b559e8e2

{
  "passwordCredentials": [
      {
        "customKeyIdentifier": "QwBJAA==",
        "endDate": "2299-12-31T00:00:00Z",
        "keyId": "d051b409-423c-41df-ba79-5f38e552de7e",
        "startDate": "2017-01-27T15:25:37.9541786Z",
        "value": null
      },
      {
        "customKeyIdentifier": null,
        "endDate": "2018-01-26T10:29:49.043107Z",
        "keyId": "20e77fca-37d4-4e9f-8f2c-343f3f63be1d",
        "startDate": "2017-01-26T10:29:49.043107Z",
        "value": null
      }
    ]
}
weinong commented 6 years ago

I bet the credential you used to create that AKS is using an expired credential. If you really want to dig into this to verify, you can get on to the nodes and find the password from /etc/kubernetes/azure.json. After all, there is currently no way to update this password. So I'd suggest to create another AKS and specify SPN explicitly

tslavik commented 6 years ago

@weinong Really? I have similar issue. My SP key expired and I generated a new one. But how could I change the key for AKS?

bpatra commented 6 years ago

I can reproduce (tried two times).

seanknox commented 6 years ago

Believe this has been fixed; please reopen if still an issue.