Open mtszkw opened 3 years ago
Did you apply the kubeflow.yaml file? It looks like all that was deployed is Argo CD.
Yes, I did:
./setup_repo.sh examples/setup.conf
kustomize build distribution/external-secrets/ | kubectl apply -f -
kustomize build distribution/argocd/ | kubectl apply -f -
kubectl apply -f distribution/kubeflow.yaml
I just noticed that I used invalid git repo url (with no .git
suffix).
No idea if this was the issue, but I am now trying to set it all up from scratch.
The best way to debug stuff is usually the Argo CD UI. Given that nothing was deployed you'll probably see an error with the Kubeflow application in the UI. Personally I don't use the .git prefix for the repo and I haven't noticed any problem with that.
I could see one error in KF app saying that git repo secret was not found
That probably blocked the deployment. I need to run an errand at the moment, but I can look at what is causing this afterwards.
Thanks a lot, I'll be waiting. It exactly said: secret "git-repo-secret" not found
. I can see that git-repo-secret is indeed being used in secret.yaml and configmap-patch.yaml but I honestly I don't know how this should be used properly.
What you can do is just manually apply the applications from the argocd-applications directory that you want to deploy. The Kubeflow.yaml is mainly meant as a convenient way to deploy everything at once. This way you don't have to wait for me to continue with your work.
Redeployed and I still can see ComparisonError: secret "git-repo-secret" not found
. How (and why) should I set up this secret correctly for a public git repo? More than that, I am unable to sync or edit my app configuration in Argo UI, because:
Unable to load data: Request has been terminated Possible causes: the network is offline, Origin is not allowed by Access-Control-Allow-Origin, the page is being unloaded, etc.
Which might make sense as I run this deployment only on my localhost, so I guess this can all be offline.
I've seen that Argo UI error come up a few times, and refreshing usually solves it. Are you using the port-forward method from the Argo CD documentation to access the UI? Were you able to deploy the individual application specs?
I am using port-forward to access UI, I managed to deploy individual apps manually, they show up in applications tab but they have the same problem as kubeflow: Status Healthy, Sync Unknown, git-repo-secret Error
Hello @mtszkw
The git-repo-secret is the username/password that argoCD needs to connect to github for accessing the Manifests. You should have one Pod running external-secret in kube-system namespaces. This one pulls the secret from Secret Manager in AWS and put it inside the cluster. Try to check the logs from this pod. If the Secrets are not there argoCD is unable to fetch the manifests
Hi @GetOn4. This pod is up and running, no suspiciosu events, although one thing I noticed in pod logs is:
Environment:
AWS_ROLE_ARN: <<__role_arn.external_secrets__>>
I think this one was missing in config and could not be replaced properly (this PR seems to fix it).
If the pod is up and running without errors it should get the secrets from AWS Secret Manager and create the secrets in the cluster. I've got some permission errors by setting this up.
You can check it:
kubectl get secret git-repo-secret -n argocd
kubectl get secret git-repo-secret -n argocd
Error from server (NotFound): secrets "git-repo-secret" not found
Which is exactly the error I see in Argo UI for each application that is running. I see other secrets though:
argocd argocd-application-controller-token-vr577 kubernetes.io/service-account-token 3 63m
argocd argocd-dex-server-token-swf8k kubernetes.io/service-account-token 3 63m
argocd argocd-initial-admin-secret Opaque 1 62m
argocd argocd-redis-token-nz2gj kubernetes.io/service-account-token 3 63m
argocd argocd-secret Opaque 5 63m
argocd argocd-server-token-rh5nf kubernetes.io/service-account-token 3 63m
argocd default-token-kwhgr kubernetes.io/service-account-token 3 63m
...
You won't see an error since the deployment is fine. You should see some erros in the exteernal-secrets pod.
The file in argoflow-aws/distribution/argocd/secret.yaml describes the secrets. Only if argoCD gets this secrets further deployments are working
The file in argoflow-aws/distribution/argocd/secret.yaml describes the secrets.
So in this file again, roleArn
was not replaced successfully. Is that the cause of problem I am having?
roleArn: <<__role_arn.external_secrets.argocd__>>
It causing problems. You should set it.
Are you sure the deployment of argoCD is fine? Can you access the dashboard?
Further do you have the external-secret pod running in kube-system?
kubectl get pods -n kube-system
Yes, external-secret pod is running in kube-system, has no error events (https://github.com/argoflow/argoflow-aws/issues/84#issuecomment-850291729) ArgoCD seems fine, I can access the dashboard, see applications running, but: https://github.com/argoflow/argoflow-aws/issues/84#issuecomment-850281558
I will now set role_arn.external_secrets.argocd
, update the environment and see what happens.
That's weird. You should have the secrets if there are no errors shown
Ok @GetOn4, after re-running everything:
but:
git-repo-secret
in argocd
namespace,Application conditions ComparisonError secret "git-repo-secret" not found
Could you please show me the logs from the external-secret pod?
Oh, I was only looking at the events history, forgot about logs, yeah. That makes more sense now.
{"level":50,"message_time":"2021-05-28T10:51:50.114Z","pid":18,"hostname":"external-secrets-6cbb666466-zvkxs","payload":{"message":"Missing credentials in config, if using AWS_CONFIG_FILE, set AWS_SDK_LOAD_CONFIG=1","code":"CredentialsError","time":"2021-05-28T10:51:50.113Z","requestId":"fb050f26-d742-4b55-b765-d8bc65adefd6","statusCode":403,"retryable":false,"retryDelay":0.2847250004510915,"originalError":{"message":"Could not load credentials from ChainableTemporaryCredentials","code":"CredentialsError","time":"2021-05-28T10:51:50.113Z","requestId":"fb050f26-d742-4b55-b765-d8bc65adefd6","statusCode":403,"retryable":false,"retryDelay":0.2847250004510915,"originalError":{"message":"User: arn:aws:sts::XXXX:assumed-role/ecas_argoflow_test2021052810242172600000000c/i-046453dc839a8d422 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXX:role/ecas_argoflow_test-admin","code":"AccessDenied","time":"2021-05-28T10:51:50.113Z","requestId":"fb050f26-d742-4b55-b765-d8bc65adefd6","statusCode":403,"retryable":false,"retryDelay":0.2847250004510915}}},"msg":"failure while polling the secret argocd/git-repo-secret"}
@mtszkw if you are using "Option 2" as described in the README, please see the updated instructions here: https://github.com/argoflow/argoflow-aws/pull/87
The policy to allow the external-secret IRSA role to assume the roles for each specific secret was missing.
@mtszkw Not sure if you are still having this problem, but I believe removing this section will fix not needing a secret when using a public repository.
I guess we should probably make the base ArgoCD spec for public repos and then make a overlay that requires credentials for a private one
@DavidSpek @karlschriek I paused this project for a moment, will probably come back to this after weekend
Hi @DavidSpek @karlschriek and others,
I just forked argoflow-aws repo, configured it and deployed onto my AWS account. I wanted to used pretty basic configuration (in kustomization.yaml) i.e. no external domain, auth etc. I managed to get argoflow up and running, however I cannot see any Kubeflow-related pods or services (I was particularly looking for ingress-gateway as Kubeflow UI). Could you help me? What do I miss?
Kustomization.yaml:
Output: