Closed gldraphael closed 1 year ago
Which version of flux are you on? You can run flux version
to check?
I want to see if I can reproduce this error on my end, so any more details on how you set up the cluster would be great. We have e2e tests for kubelet identity (but the cluster uses system identity)
Flux version returns this:
~ ❯ flux version
flux: v0.41.2
helm-controller: v0.31.2
kustomize-controller: v0.35.1
notification-controller: v0.33.0
source-controller: v0.36.1
I created the cluster from the azure portal but I'm happy to put together a terraform script if that will help.
Edit:
I think to reproduce this, the cluster should use Azure AD, and the cluster should have more than one User Assigned Managed Identity. I'm validating this assumption right now.
I created a new cluster with a single User Assigned Managed Identity (UAI):
Node pools Node pools 1 Enable virtual nodes Disabled
Access Resource identity: System-assigned managed identity Local accounts: Disabled Authentication and Authorization: Azure AD authentication with Kubernetes RBAC Cluster admin group: Cluster Admin Encryption type: (Default) Encryption at-rest with a platform-managed key
Networking Network configuration: Kubenet Load balancer: Standard Private cluster: Disabled Authorized IP ranges: Disabled Network policy: None
Integrations Container registry: None Microsoft Defender for Cloud: Free Enable Container Logs: Disabled Alerts: Not enabled Azure Policy: Disabled
And I see the following error (which is similar to what I saw when I set AZURE_CLIENT_ID
in the previous cluster with more than one UAI):
failed to get credential from azure: error exchanging token: failed to decode the response: invalid character '<' looking for beginning of value
Seems like this truly is a bug. Let me know if you have trouble reproducing this.
Hey, Sorry for the long wait. I just tested this on the latest version and it worked okay:
fleet-infra git:(main) flux -v
flux version 2.0.0-rc.2
I created an AKS cluster with the following properties (as stated in the previous comment)
I assigned an AcrPull role to the cluster's managed identity and it reconciled successfully. Next, I added a second managed identity to the cluster and it failed to reconcile (which is expected):
► annotating OCIRepository podinfo in flux-system namespace
✔ OCIRepository annotated
◎ waiting for OCIRepository reconciliation
✗ OCIRepository reconciliation failed: 'failed to get credential from azure: DefaultAzureCredential: failed to acquire a token.
Attempted credentials:
EnvironmentCredential: missing environment variable AZURE_TENANT_ID
WorkloadIdentityCredential: missing environment variables for workload identity. Check webhook and pod configuration
ManagedIdentityCredential: no default identity is assigned to this resource
AzureCLICredential: Azure CLI not found on path
Then I added the AZURE_CLIENT_ID
env variable to the source-controller pod and it reconciled successfully.
Can you try upgrading to 2.0.0-rc.2
Thanks for testing this out @somtochiama
I just tested it with v2.0.0-rc.3
but still see the same error unfortunately:
failed to get credential from azure: error exchanging token: failed to decode the response: invalid character '<' looking for beginning of value
I will try again on Monday just to be certain.
I am still seeing the same error. I see it when I add the following source:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: myacr
namespace: experiments
spec:
type: oci
provider: azure
url: oci://myacr.azurecr.io
interval: 5m
Are you able to reproduce this?
I was testing using OCIRepository
instead of HelmRepository
. I will try again today
Hey @gldraphael ,
I have been able to reproduce this. Can you try specifying the repository in the URL i.e
spec:
type: oci
provider: azure
url: oci://myacr.azurecr.io/<repo-name>
Well, that kinda works, but not quite. My chart is at oci://myacr.azurecr.io/clippy
. Not at oci://myacr.azurecr.io/charts/clippy
.
Earlier, I tried:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: myacr
namespace: experiments
spec:
type: oci
provider: azure
url: oci://myacr.azurecr.io
interval: 5m
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: clippy
namespace: experiments
spec:
releaseName: clippy
chart:
spec:
chart: clippy
sourceRef:
kind: HelmRepository
name: myacr
version: 1.0.1
interval: 50m
install:
remediation:
retries: 3
values: {}
And that shows the error I reported earlier.
Now, I tried the following:
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: clippy
namespace: experiments
spec:
type: oci
provider: azure
url: oci://myacr.azurecr.io/clippy
interval: 5m
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: clippy
namespace: experiments
spec:
releaseName: clippy
chart:
spec:
chart: clippy
sourceRef:
kind: HelmRepository
name: clippy
namespace: experiments
version: 1.0.1
interval: 50m
install:
remediation:
retries: 3
values: {}
I see no errors on the HelmRepository
anymore, but the HelmChart
shows the following error:
~/flux/flux get source chart experiments-clippy -n experiments
NAME REVISION SUSPENDED READY MESSAGE
experiments-clippy False False chart pull error: failed to download chart for remote reference: failed to get 'oci://myacr.azurecr.io/clippy/clippy:1.0.1': myacr.azurecr.io/clippy/clippy:1.0.1: not found
It appears to be trying to get the chart from the wrong place: myacr.azurecr.io/clippy/clippy:1.0.1
instead of myacr.azurecr.io/clippy:1.0.1
I think a possible workaround may be to move my chart to myacr.azurecr.io/charts/clippy:1.0.1
.
What I do not understand is why I no longer see any error on the HelmRepository
when I use oci://myacr.azurecr.io/clippy
as opposed to oci://myacr.azurecr.io
. Does that URL always expect a base path after the origin?
I think a possible workaround may be to move my chart to myacr.azurecr.io/charts/clippy:1.0.1.
Yes, you would have to use this as a workaround while I get this fixed.
The HelmRepository
should work with the repository root address but right now there's a bug that prevents it from doing so. When exchanging the token, it makes a request to index.docker.io
due to some defaulting in a library we use.
Thanks for reporting this!
Ah! Feel free let me know if you'd like me to test anything. Appreciate your patience here!
@gldraphael This issue will be fixed in the latest release of flux
@somtochiama - I tested this out, it works! Thanks!
@gldraphael any advice for this when using the flux extension? I can't get this working either without setting the ClientID somehow, but because I'm using the flux extension there doesn't seem to be a way to cleanly patch the source controller manifests.
@joshuadmatthews - I have never used the Azure Flux extensions. I think the best thing to do would be to ask Azure Support if you haven't already. Their extensions should be covered, if I'm not mistaken. Let us know what they say here!
But since you asked for my advice, I'd say avoid the extensions as far as you can!
I was able to get it working by deployed a patch with kubectl, which allows me to target a resource versus a manifest. It would be nice if flux had a way to apply patches directly versus having to patch a yaml file that is also in source control.
@joshuadmatthews Flux can patch existing objects in-cluster, but being a GitOps tool, the patch must be specified in source control. Here is an example: https://fluxcd.io/flux/faq/#how-to-patch-coredns-and-other-pre-installed-addons
Also please note that we don't offer support for Azure extensions, you need to raise the ACR auth issue with Microsoft support. When installing Flux using flux bootstrap
here is now you can set the ClientID: https://fluxcd.io/flux/installation/configuration/workload-identity/#azure-workload-identity
Thanks @stefanprodan, good to know there is a method to match resources that weren’t originally added by flux.
With the Azure extensions, I did eventually find a document that described how to configure the extensions to setup workload identity.
@joshuadmatthews Did you get it working with the Azure flux-extension? Can you share the document about configuring the extension to setup workload identity?
I am using the Azure flux-extension and having the issue to authenticate to ACR with kubelet identity.
"error":"failed to get credential from 'azure': DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tEnvironmentCredential: missing environment variable AZURE_TENANT_ID\n\tWorkloadIdentityCredential: no client ID specified. Check pod configuration or set ClientID in the options\n\tManagedIdentityCredential: failed to authenticate a system assigned identity. The endpoint responded with {\"error\":\"invalid_request\",\"error_description\":\"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request\"}\n\tAzureCLICredential: Azure CLI not found on path\n\tAzureDeveloperCLICredential: Azure Developer CLI not found on path"
@gxy12280421 see the Workload Identity section here
az k8s-extension create --resource-group
You can do an update instead of a create if you already installed flux with Bicep/ARM.
@joshuadmatthews Thank you very much for the quick info which pointed me to the right direction. I got it working by adding useKubeletIdentity = "true" in the Azure flux extension since I assigned the ACRPull permission on the kubelet identity.
I created a test cluster
exp-aks-02
with the following configuration:(The cluster does not use the ACR integration.)
I then went ahead and bootstrapped flux, and assigned ACR Pull and Reader permissions to the User Assigned Managed Identity
exp-aks-02-agentpool
on a ACR instance.At this point, I expected it to just work, but
flux get sources
would show this error:Ideas?
Other Observations
Fetching token by specifying the UAI to use
I followed the thread at https://github.com/fluxcd/source-controller/issues/898 and concluded the reason this happens is because I have two UAIs (User Assigned managed Identities) attached to this cluster (
exp-aks-02-agentpool
andaciconnectorlinux-exp-aks-02
).So I tried patching the flux-system kustomization to add
AZURE_CLIENT_ID
:But I now see this error (which almost feels like a bug):
However hitting the token API directly works as long as I include the
client_id
parameter:akv2k8s works ok
I am able to consume secrets from azure keyvault using the akv2k8s project which appears to use the
userAssignedIdentityID
value from/etc/kubernetes/azure.json
: