argoproj-labs / argocd-notifications

Notifications for Argo CD
https://argocd-notifications.readthedocs.io/
Apache License 2.0
493 stars 141 forks source link

Failed to notify recipient #347

Open hroyg opened 3 years ago

hroyg commented 3 years ago

Summary

ever since we upgraded argocd version to v2.1.3 and with the new version github authentication updated to be a secret instead of in the cm with reference to a secret (the github repo url is defined now in the new version in a secret and not in cm as before) and the authentication configurations definition has changed,

we get error from argocd notification and that in turn makes apicalls and slack messages fail and not being executed/sent.

this does not happen all the time , it seems to randomly happen with some applications. what i dont understand and probably because i dont understand how argocd-notification works exactly, is why does that started happen after argocd version upgrade, doesnt this function that it executes and fails (<call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>) is executed by the argocd-notifications ??..

Diagnostics

eks

argocd: 2.1.3 argocd notifications: v1.1.1


time="2021-10-14T11:26:40Z" level=error msg="Failed to notify recipient {jenkins } defined in app argocd/monitoring: template: jenkins-api-calljenkins:1:27: executing \"jenkins-api-calljenkins\" at <call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>: error calling call: rpc error: code = Internal desc = Failed to fetch 8179a397e623f56c7f36b4a5781ad233af2bbe5b: `git fetch origin --tags --force` failed exit status 128: fatal: could not read Username for 'https://github.com': No such device or address" app=argocd/monitoring

time="2021-10-14T11:26:40Z" level=error msg="Failed to notify recipient {slack cloud-cd-stage} defined in app argocd/monitoring: template: custom-synced-and-healty:5:26: executing \"custom-synced-and-healty\" at <call .repo.GetCommitMetadata .app.status.operationState.syncResult.revision>: error calling call: rpc error: code = Internal desc = Failed to fetch 8179a397e623f56c7f36b4a5781ad233af2bbe5b: `git fetch origin --tags --force` failed exit status 128: fatal: could not read Username for 'https://github.com': No such device or address" app=argocd/monitoring

any input here to resolve the issue will be much appropriated .

Thanks

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

ryota-sakamoto commented 2 years ago

@hroyg I think we need to break down the issue.

  1. if you downgrade the version of ArgoCD to old, is the problem fixed?
  2. the problem is ArgoCD version problem or argocd-notifications problem or something
hroyg commented 2 years ago

@ryota-sakamoto

  1. Downgrading argocd version back to v2.0.3 (and when downgrading i also changed the way i pass the github authentication back to be in configmap resolved the issue . we changed back the authentication mechanism to be in CM as follow:

repositories: |

** also resolved the issue staying in argocd v2.1.3 (not downgrading back to previous version) and just changing the config to be as above (the old way, as we used it before the upgrade to v2.1.3), so the problem i guess relates to the new way argocd pass the github PAT password, or maybe how argocd notifications uses them (not to familiar with the flow argocd notifications connects with github through argocd server/repo-server)

  1. The problem started after upgrading argocd version, but the error appears in argocd notifications controller.

** we haven't encounter any functionality issues for argocd with the new way of authenticating to github (the new way is to pass the repo name and authentication, e.g PAT, as secrets).

Thumbiceq commented 2 years ago

Same issue, not working with ArgoCD v2.1.1, downgrading to v2.0.5 helped.

(call .repo.GetAppDetails).Helm.GetParameterValueByName is randomly failing with Failed to fetch 023d82c5c49bdc9aa05ac32801d2800e900ff7c0: 'git fetch origin --tags --force'

As well when two notification services are defined and same function is used in both service templates ((call .repo.GetAppDetails).Helm.GetParameterValueByName), it fails only with the first one. The second is sent normally without any errors.

slack notification service template:

...
          {
            "title": "Upstream Repository",
            "value": "{{ (call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.upstreamRepository" }}",
            "short": true
          },
...

Webhook notification service template:

...
path: /api/v4/projects/{{(call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.upstreamRepository"}}/statuses/{{(call .repo.GetAppDetails).Helm.GetParameterValueByName "app-vue.labels.commitSHA"}}?state=success
...
ryota-sakamoto commented 2 years ago

I reproduced this issue, then I'm investigating it.

mbolek commented 2 years ago

think this the same issue: https://github.com/argoproj-labs/argocd-notifications/issues/356 I'm also affected by this but... not always, for whatever reason, on some occasions I get the Failed to fetch on other the notifications work and get data from the commit

agaudreault commented 2 years ago

The reason this broke without updating argocd-notifications is because it calls the argocd-repo-server service to get the information and it is actually that services peerforming the call and failing. https://github.com/argoproj-labs/argocd-notifications/blob/c461d624b4c02452e85821361bb1c4c2d2e487b7/shared/argocd/service.go#L74

There is also a cache mechanism in argo-repo-server so that might explain why sometimes the notification goes through.

agaudreault commented 2 years ago

For reference, we have the same problem and our ArgoCD instance is configured on our private repositories with credential template and a GitHub app according to https://argo-cd.readthedocs.io/en/stable/user-guide/private-repositories/#github-app-credential. (Using 2.1.3)

argocd repocreds list
URL PATTERN              USERNAME  SSH_CREDS  TLS_CREDS
https://github.com/org   -         false      false
tsunamishaun commented 2 years ago

I was hoping the new release v1.2.1/ #370 that fixed my similar issue #356 would have also fixed this (which I am now seeing clearly after upgrading).

My specific errors are around de-referencing the Application object attributes in the trigger (when and oncePer clause). I have no way to gracefully handle these as I do in the templates (setting default values in case they don't exist).

time="2021-12-15T22:01:28Z" level=error msg="failed to execute oncePer condition: cannot fetch images from <nil> (1:20)\n | app.status.summary.images\n | ...................^"
time="2021-12-15T22:01:28Z" level=error msg="failed to execute when condition: cannot fetch phase from <nil> (1:27)\n | app.status.operationState.phase in ['Error', 'Failed']\n | ..........................^"
time="2021-12-15T22:01:28Z" level=error msg="failed to execute oncePer condition: cannot fetch syncResult from <nil> (1:27)\n | app.status.operationState.syncResult.revision\n | ..........................^"

Any help appreciated.

agaudreault commented 2 years ago

After upgrading to v1.2.1 with argoCD v2.2.3, I am able to use .repo.GetCommitMetadata.

ichasco-heytrade commented 2 years ago

Hi, I am using in the slack templates:

...
{
              "title": "Author",
              "value": "{{(call .repo.GetCommitMetadata .app.status.sync.revision).Author}}",
              "short": true
            }
...

It works but it gives me the next error:

argocd-notifications-controller-769fb8f4fd-ptgql argocd-notifications-controller time="2022-03-08T09:30:24Z" level=error msg="Failed to notify recipient {slack releases-dev} defined in resource argocd/service1: template: app-deployed:23:17: executing \"app-deployed\" at <call .repo.GetCommitMetadata .app.status.sync.revision>: error calling call: rpc error: code = Unavailable desc = connection error: desc = \"transport: authentication handshake failed: tls: first record does not look like a TLS handshake\"" resource=argocd/service1

I think that is related with argocd-repo. But I don't want to disable here TLS because I have to disable it also in server and application-controller. Is there any other way to show the author of the commit?

I am using these versions:

argocd: 2.2.5 argo-notifications: 1.2.1

Thanks

muhammad-asn commented 2 years ago

Is there any update on this? It seems the issue is still open and nowhere to the solution?

mubarak-j commented 2 years ago

if this issue hasn’t been resolved in latest argo-cd v2.4.0, then i think this issue should be resubmitted upstream because notification code was merged with argo-cd repo. I doubt issues here are monitored or being worked on anymore.

sinkr commented 2 years ago

I'm unsure how to follow @mubarak-j 's advice above, so I'll throw on here that this is still occurring in v2.4.11+3d9e9f2.