Azure / kubelogin

A Kubernetes credential (exec) plugin implementing azure authentication
https://azure.github.io/kubelogin/
MIT License
487 stars 92 forks source link

No possibility to use Service Principal when Workload Identity is enabled in the Pod #375

Open daleksandrowiczgd opened 10 months ago

daleksandrowiczgd commented 10 months ago

Problem

We have been encountering the issue in our pipelines, when we try to run kubectl commands in self-managed runners.

When we have Workload Identity variables in our Pod (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_FEDERATED_CREDENTIALS), we've no choice to use another type of Azure AD object like Service Principal, even if we have converted the kubeconfig by using kubelogin convert-kubeconfig with Service Principal credentials.

When we migrated from deprecated AAD Pod Identity, we had to do many workarounds to not interrupt working of each of our pipelines/scripts. For example, we had to override the AZURE_CLIENT_ID variable only for the time of running kubectl command, which is redundant, time consuming and not convenient.

We noticed that even if you are logged in via az login with SP credentials, running the az aks get-credentials and then kubelogin convert-kubeconfig -l spn with Service Principal arguments --client-id <AZURE_CLIENT_ID> --client-secret <AZURE_CLIENT_SECRET> --tenant-id <AZURE_TENANT_ID>, the kubeconfig file will anyway get Workload Identity client ID in the kubelogin get-token command as the CLI argument.

Moreover, even if we override AZURE_CLIENT_ID to use SP one only during running kubelogin convert-kubeconfig -l spn command, the next kubectl command run still uses the Workload Identity.

AZURE_CLIENT_ID environment variable always takes precedence over cli arguments - I don't know if it's intended, but for sure is not obvious.

There should be an option to authenticate to the AKS cluster by using Service Principal, even if Workload Identity's variables are injected into the Pod. The fact that the command line arguments in the kubeconfig are always overriden by AZURE_CLIENT_ID variable doesn't give us any flexibility.

How to reproduce

Prerequisites

Expected output

kubectl command runs with Service Principal credentials

Real output

kubectl command runs with Workload Identity credentials

weinong commented 10 months ago

@daleksandrowiczgd can you share what your kubeconfig looks like after kubelogin convert-kubeconfig -l spn? you only need to capture the exec plugin part:

      exec:
        apiVersion: client.authentication.k8s.io/v1beta1
        command: kubelogin
        args:
          - get-token
          - --environment
          - AzurePublicCloud
          - --server-id
          - <AAD server app ID>
          - --client-id
          - <AAD client app ID>
          - --tenant-id
          - <AAD tenant ID>

besides, in your repro step you seem to miss -l spn? not sure if this is intentional or not?

Convert kubeconfig with kubelogin convert-kubeconfig command and Service Principal parameters specified as arguments --client-id <AZURE_CLIENT_ID> --client-secret <AZURE_CLIENT_SECRET> --tenant-id <AZURE_TENANT_ID>

can you also share the actual relevant environment variables in your runner environment? There is no reference to AZURE_FEDERATED_CREDENTIALS in any repo I can find.

daleksandrowiczgd commented 10 months ago

besides, in your repro step you seem to miss -l spn? not sure if this is intentional or not?

Non intentional, I missed it this in my message, so please just assume it should be there.

can you also share the actual relevant environment variables in your runner environment? There is no reference to AZURE_FEDERATED_CREDENTIALS in any repo I can find.

Actually, I just mentioned all environment variables set by Workload Identity, but it seems like only the AZURE_CLIENT_ID variable generates conflicts for kubelogin tool.

can you share what your kubeconfig looks like after kubelogin convert-kubeconfig -l spn? you only need to capture the exec plugin part:

Yes, let me even share you few cases with exact commands I run and results of the kubeconfig file after running each of them.

In all cases the following env variables will be set (I will use the same name to identify the same values in kubeconfig):

export SP_CLIENT_ID="<SP_CLIENT_ID>"                    # Client (Application) ID of the Service Principal
export SP_CLIENT_SECRET="<SP_CLIENT_SECRET>"            # Client secret of the Service Principal
export SP_TENANT_ID="<SP_TENANT_ID>"                    # Tenant ID of the Service Principal
export CLUSTER_RG="<CLUSTER_RG>"                        # AKS cluster resource group
export CLUSTER_NAME="<CLUSTER_NAME>"                    # AKS cluster name
export CUSTOM_KUBECONFIG_PATH="${HOME}/.kube/config"    # Path to the kubeconfig

Additionally, for better understanding of the output I will specify WI_CLIENT_ID (Workload Identity client ID), instead of the real value.

Pass Service Principal Client ID as the CLI argument in the kubelogin command (desired usage)

az login --service-principal -u "${SP_CLIENT_ID}" -p "${SP_CLIENT_SECRET}" -t "${SP_TENANT_ID}"
az aks get-credentials --resource-group "${CLUSTER_RG}" --name "${CLUSTER_NAME}" --file "${CUSTOM_KUBECONFIG_PATH}"
kubelogin convert-kubeconfig --login "spn" --kubeconfig "${CUSTOM_KUBECONFIG_PATH}" --client-id "${SP_CLIENT_ID}" --client-secret "${SP_CLIENT_SECRET}" --tenant-id "${SP_TENANT_ID}"

Kubeconfig:

exec:
  apiVersion: client.authentication.k8s.io/v1beta1
  args:
  - get-token
  - --login
  - spn
  - --server-id
  - 6dae42f8-4368-4678-94ff-3960e28e3630
  - --client-id
  - ${WI_CLIENT_ID}
  - --tenant-id
  - ${SP_TENANT_ID}
  - --environment
  - AzurePublicCloud
  - --client-secret
  - ${SP_CLIENT_SECRET}
  command: kubelogin
  env: null
  installHint: |2

    kubelogin is not installed which is required to connect to AAD enabled cluster.

    To learn more, please go to https://aka.ms/aks/kubelogin
  provideClusterInfo: false
$ kubectl get pod --kubeconfig ${CUSTOM_KUBECONFIG_PATH}
# RESPONSE 401 Unauthorized
# --------------------------------------------------------------------------------
# {
#   "error": "invalid_client",
#   "error_description": "AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '${WI_CLIENT_ID}'. Trace ID: 1d7b8101-9172-4b1e-8ac6-ac840f666301 Correlation ID: 9d204593-95f3-460e-81e6-fb6a94e4d959 Timestamp: 2024-01-03 10:45:10Z",
#   "error_codes": [
#     7000215
#   ],
#   "timestamp": "2024-01-03 10:45:10Z",
#   "trace_id": "1d7b8101-9172-4b1e-8ac6-ac840f666301",
#   "correlation_id": "9d204593-95f3-460e-81e6-fb6a94e4d959",
#   "error_uri": "https://login.microsoftonline.com/error?code=7000215"
# }
# --------------------------------------------------------------------------------
# To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret
# Unable to connect to the server: getting credentials: exec: executable kubelogin failed with exit code 1

Conclusion: --client-id argument in kubeconfig was overriden by Workload Identity. kubectl command doesn't work correctly - return 401 error.

Override AZURE_CLIENT_ID environment variable only for the kubelogin command

az login --service-principal -u "${SP_CLIENT_ID}" -p "${SP_CLIENT_SECRET}" -t "${SP_TENANT_ID}"
az aks get-credentials --resource-group "${CLUSTER_RG}" --name "${CLUSTER_NAME}" --file "${CUSTOM_KUBECONFIG_PATH}"
AZURE_CLIENT_ID="${SP_CLIENT_ID}" kubelogin convert-kubeconfig --login "spn" --kubeconfig "${CUSTOM_KUBECONFIG_PATH}" --client-id "${SP_CLIENT_ID}" --client-secret "${SP_CLIENT_SECRET}" --tenant-id "${SP_TENANT_ID}"

Kubeconfig:

exec:
  apiVersion: client.authentication.k8s.io/v1beta1
  args:
  - get-token
  - --login
  - spn
  - --server-id
  - 6dae42f8-4368-4678-94ff-3960e28e3630
  - --client-id
  - ${SP_CLIENT_ID}
  - --tenant-id
  - ${SP_TENANT_ID}
  - --environment
  - AzurePublicCloud
  - --client-secret
  - ${SP_CLIENT_SECRET}
  command: kubelogin
  env: null
  installHint: |2

    kubelogin is not installed which is required to connect to AAD enabled cluster.

    To learn more, please go to https://aka.ms/aks/kubelogin
  provideClusterInfo: false
$ kubectl get pod --kubeconfig ${CUSTOM_KUBECONFIG_PATH}
RESPONSE 401 Unauthorized
--------------------------------------------------------------------------------
{
  "error": "invalid_client",
  "error_description": "AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '${WI_CLIENT_ID}'. Trace ID: 854ccce1-bdf3-4122-8453-8e3ba73f9700 Correlation ID: 1371d7e8-053f-4f32-8f7b-9859bc27f635 Timestamp: 2024-01-03 11:00:37Z",
  "error_codes": [
    7000215
  ],
  "timestamp": "2024-01-03 11:00:37Z",
  "trace_id": "854ccce1-bdf3-4122-8453-8e3ba73f9700",
  "correlation_id": "1371d7e8-053f-4f32-8f7b-9859bc27f635",
  "error_uri": "https://login.microsoftonline.com/error?code=7000215"
}
--------------------------------------------------------------------------------
To troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#client-secret
Unable to connect to the server: getting credentials: exec: executable kubelogin failed with exit code 1

Conclusion: --client-id argument in kubeconfig got correct Client ID of the Service Principal. kubectl command doesn't work correctly - return 401 error, again with the Workload Identity client ID in the error (WI_CLIENT_ID).

Override AZURE_CLIENT_ID environment variable only for kubelogin and kubectl commands

az login --service-principal -u "${SP_CLIENT_ID}" -p "${SP_CLIENT_SECRET}" -t "${SP_TENANT_ID}"
az aks get-credentials --resource-group "${CLUSTER_RG}" --name "${CLUSTER_NAME}" --file "${CUSTOM_KUBECONFIG_PATH}"
AZURE_CLIENT_ID="${SP_CLIENT_ID}" kubelogin convert-kubeconfig --login "spn" --kubeconfig "${CUSTOM_KUBECONFIG_PATH}" --client-id "${SP_CLIENT_ID}" --client-secret "${SP_CLIENT_SECRET}" --tenant-id "${SP_TENANT_ID}"

Kubeconfig:

exec:
  apiVersion: client.authentication.k8s.io/v1beta1
  args:
  - get-token
  - --login
  - spn
  - --server-id
  - 6dae42f8-4368-4678-94ff-3960e28e3630
  - --client-id
  - ${SP_CLIENT_ID}
  - --tenant-id
  - ${SP_TENANT_ID}
  - --environment
  - AzurePublicCloud
  - --client-secret
  - ${SP_CLIENT_SECRET}
  command: kubelogin
  env: null
  installHint: |2

    kubelogin is not installed which is required to connect to AAD enabled cluster.

    To learn more, please go to https://aka.ms/aks/kubelogin
  provideClusterInfo: false
$ AZURE_CLIENT_ID="${SP_CLIENT_ID}" kubectl get pod --kubeconfig ${CUSTOM_KUBECONFIG_PATH}
No resources found in default namespace.

Conclusion: --client-id argument in kubeconfig got correct Client ID of the Service Principal. kubectl works correctly, because AZURE_CLIENT_ID was overriden only during running this command.

Override AZURE_CLIENT_ID environment variable only for the kubectl command

az login --service-principal -u "${SP_CLIENT_ID}" -p "${SP_CLIENT_SECRET}" -t "${SP_TENANT_ID}"
az aks get-credentials --resource-group "${CLUSTER_RG}" --name "${CLUSTER_NAME}" --file "${CUSTOM_KUBECONFIG_PATH}"
kubelogin convert-kubeconfig --login "spn" --kubeconfig "${CUSTOM_KUBECONFIG_PATH}" --client-id "${SP_CLIENT_ID}" --client-secret "${SP_CLIENT_SECRET}" --tenant-id "${SP_TENANT_ID}"

Kubeconfig:

exec:
  apiVersion: client.authentication.k8s.io/v1beta1
  args:
  - get-token
  - --login
  - spn
  - --server-id
  - 6dae42f8-4368-4678-94ff-3960e28e3630
  - --client-id
  - ${WI_CLIENT_ID}
  - --tenant-id
  - ${SP_TENANT_ID}
  - --environment
  - AzurePublicCloud
  - --client-secret
  - ${SP_CLIENT_SECRET}
  command: kubelogin
  env: null
  installHint: |2

    kubelogin is not installed which is required to connect to AAD enabled cluster.

    To learn more, please go to https://aka.ms/aks/kubelogin
  provideClusterInfo: false
$ AZURE_CLIENT_ID="${SP_CLIENT_ID}" kubectl get pod --kubeconfig ${CUSTOM_KUBECONFIG_PATH}
No resources found in default namespace.

Conclusion: --client-id argument in kubeconfig was overriden by Workload Identity. kubectl works correctly, because AZURE_CLIENT_ID was overriden only during running this command.

So as you can see, always the AZURE_CLIENT_ID variable takes precedence over CLI arguments defined in the kubeconfig:

Hope everything is understandable.

enj commented 10 months ago

@daleksandrowiczgd does directly overriding the env field help as a workaround?

exec:
  apiVersion: client.authentication.k8s.io/v1beta1
  args:
  - get-token
  - ...
  command: kubelogin
  env:
  - name: "AZURE_CLIENT_ID"
    value: "value for client ID"
weinong commented 10 months ago

what is your environment? do you know how AZURE_CLIENT_ID is set?

daleksandrowiczgd commented 10 months ago

@weinong, our environment:

We migrated from deprecated AAD Pod Identity to the Workload Identity. In our Gitlab CI jobs we want to give us the opportunity to switch between Service Principal or Managed Identity credentials, if needed (e.g. sometimes we have issues with Azure API throttling, so we need to have the option to use SP in such cases). After migration to Workload Identity, we encountered the problem that I described in the issue description.

In general, how the runners config related to the Workload Identity looks like and how those variables are injected:

Right now after migration to Workload Identity, it's impossible to use Service Principal nor Managed Identity with different client ID, because AZURE_CLIENT_ID environment variable injected by Workload Identity always takes precedence in the kubelogin commands.

daleksandrowiczgd commented 10 months ago

@enj yes, this workaround works, thanks for an idea. But we still need to modify every pipeline and script to manually update the kubeconfig file everywhere we want to use the Service Principal credentials.

Maybe it will be good to add AZURE_CLIENT_ID env variable to this exec part in the kubeconfig file, when Service Principal is chosen in the kubelogin convert-kubeconfig command (--login "spn"), wdyt?

Or even better will be to use the --client-id CLI argument that is passed to the get-token command inside the kubeconfig.

sharebear commented 8 months ago

We just hit this issue in Azure DevOps too, while trying to migrate our self-hosted runners to use workload identity instead of aadpodidentity, and on a quick skim of the docs I don't see any way to manipulate the environment for these commands to implement a similar workaround.

In general the idea that environment variables override provided cli arguments is quite surprising, most tooling I'm used to interacting with uses envionment variables as a fallback if cli arguments aren't provided. If the kubelogin tool used the environment as a fallback for missing cli arguments then there would be no problems here at all.