Open papanito opened 4 years ago
Have you tried to increase status-processors
, operation-processors
and repo-server-timeout-seconds
in your application-controller deployment as it was done in this issue https://github.com/argoproj/argo-cd/issues/1641 ?
@WaldoFR thanks for the hint. I've added the following
- --status-processors
- "20"
- --operation-processors
- "10"
- --repo-server-timeout-seconds
- "180"
I am surprised as I only have 3 apps so far and I only sync them manually for now, so I would expect syncing one app should work and no timeout occurs
Do you still face the synchronization problems ?
No seems better now, apps show OutOfSynch
which is fine. What helped was the increase of the timeout to 180
So it only temporarily worked - I got at least a status OutOfSync
which was correct and after a sync I got the message Synch OK
. But eventually I get again the same problem. Status shows Unknown
and I can see the error again. Also when I try to check the app-diff I get Unable to load data: ComparisonError: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Ok, we face the same problem (in v1.5.3), and increasing number of argocd-repo-server
replicas and changing --repo-server-timeout-seconds
does not help neither.
time="2020-06-29T14:03:41Z" level=error msg="finished unary call with code Unknown" error="Get \"https://private-repo.com/projet.deploy/project/project.git/info/refs?service=git-upload-pack\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" grpc.code=Unknown grpc.method=GenerateManifest grpc.request.deadline="2020-06-29T14:04:26Z" grpc.service=repository.RepoServerService grpc.start_time="2020-06-29T14:03:26Z" grpc.time_ms=15002.009 span.kind=server system=grpc
Seems that some other people are facing issue ( i.e. https://github.com/argoproj/argo-cd/issues/3547#issuecomment-649593167 )
Related to this issue (https://github.com/argoproj/argo-cd/issues/2957) may I ask you if you are currently deploying helm3 charts on your cluster ? It seems that we face this problems only on clusters with helm3 charts but we can't ensure it's related.
Regards,
EDIT : Additionnal informations, sometimes repository that are not syncing are synchronized for a few minutes.
I have a single cluster and yes I have some stuff like cert-manager deployed with helm 3
If I switch back to v.1.5.0 I don't have the issue and sync work fine - updated the description with the new info.
It's does not seem it's the version, I have now the same issue with v.1.5.0
So in my case is that I get a 10.42.67.206:56842->10.43.73.252:6379: i/o timeout
whereas the ip is the cluster ip of redis-server
kubectl -n argocd describe service argocd-redis
Name: argocd-redis
Namespace: argocd
Labels: app.kubernetes.io/component=redis
app.kubernetes.io/instance=argocd
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=argocd-redis
app.kubernetes.io/part-of=argocd
helm.sh/chart=argo-cd-2.5.0
Annotations: Selector: app.kubernetes.io/name=argocd-redis
Type: ClusterIP
IP: 10.43.73.252
Port: <unset> 5379/TCP
TargetPort: 5379/TCP
Endpoints: 10.42.112.147:5379
Session Affinity: None
Events: <none>
I completely removed argocd and installed it in a new namespace. So far it seems good and I don't have the issue mentioned. Still not clear why I had it in the first place.
Have you proceed to auto-sync same deployments of your previous ArgoCD setup ?
@WaldoFR no at the moment I don't use auto-sync
In my case I found that all commands were stopped at 15s exactly, browsing the code I found this part here that would explain we stop synchronization after 15 seconds :
// Returns a HTTP client object suitable for go-git to use using the following
// pattern:
// - If insecure is true, always returns a client with certificate verification
// turned off.
// - If one or more custom certificates are stored for the repository, returns
// a client with those certificates in the list of root CAs used to verify
// the server's certificate.
// - Otherwise (and on non-fatal errors), a default HTTP client is returned.
func GetRepoHTTPClient(repoURL string, insecure bool, creds Creds) *http.Client {
// Default HTTP client
var customHTTPClient = &http.Client{
// 15 second timeout
Timeout: 15 * time.Second,
// don't follow redirect
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
},
}
Git client seems to not handle timeout specified in --repo-server-timeout-seconds
so, even if the GRPC request is extended the git fetch is not. Should this option also be applied to git client ?
Regards,
References : https://github.com/argoproj/argo-cd/blob/e92e0fa4090a0c324b7a7dbf3d7a2e16ab281d85/util/git/client.go#L112
I have the same issue with helm3, argocd 1.5.3 and cert-manager 0.16.1
error="application refresh deadline exceeded"
apiVersion: apps/v1
kind: Deployment
metadata:
name: argocd-application-controller
spec:
template:
spec:
containers:
- name: argocd-application-controller
command:
- argocd-application-controller
- --status-processors
- "20"
- --operation-processors
- "50"
- --repo-server-timeout-seconds
- "300"
argocd/argocd-server-5b9466d7f-x62db[argocd-server]: time="2020-08-25T16:04:33Z" level=error msg="finished unary call with code Unknown" error="application refresh deadline exceeded" grpc.code=Unknown grpc.method=ResourceTree grpc.service=application.ApplicationService grpc.start_time="2020-08-25T16:03:33Z" grpc.time_ms=60003.48 span.kind=server system=grpc
and I have only dozen applications in control repo ...
I have the same issue while adding a new https repository from bitbucket and gitlab:
Having following error message:
Unable to connect HTTPS repository: Get "https://bitbucket.org/xxx/yyy/info/refs?service=git-upload-pack": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I enabled --insecure flag for argocd-server
These are the settings applied to the argocd-application-controller:
- command:
- argocd-application-controller
- --status-processors
- "20"
- --operation-processors
- "50"
- --repo-server-timeout-seconds
- "420"
- --app-resync
- "600"
I also tried to scale up argocd-repo-server (5 replicas) and it did not work.
git clone inside argocd-repo-server is working fine.
Installed following ArgoCD versions and had the same issue for all of them:
My cluster has 3 nodes and 1 master.
This is happening for manual and auto sync policies while creating a new app.
My argocd-repo-server logs:
time="2020-08-27T09:25:05Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2020-08-27T09:25:05Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --generate-key /tmp/gpg-key-recipe563578480" dir= execID=mX2zE
time="2020-08-27T09:25:06Z" level=info msg="Populating GnuPG keyring with keys from /app/config/gpg/source"
time="2020-08-27T09:25:06Z" level=info msg="gpg --no-permission-warning --list-public-keys" dir= execID=iYECJ
time="2020-08-27T09:25:06Z" level=info msg="gpg --no-permission-warning -a --export 22088F9D0E59A460" dir= execID=k0Ftj
time="2020-08-27T09:25:06Z" level=info msg="gpg-wrapper.sh --no-permission-warning --list-secret-keys 22088F9D0E59A460" dir= execID=grdLd
time="2020-08-27T09:25:06Z" level=info msg="Loaded 0 (and removed 0) keys from keyring"
time="2020-08-27T09:25:06Z" level=info msg="argocd-repo-server v1.7.1+da5fa74 serving on [::]:8081"
I have got a bunch of no permission warning.
I have got a bunch of no permission warning.
These are no warnings. It's just a log of a command that got executed, which happen to be executed using a parameter --no-permission-warning
This is happening for manual and auto sync policies while creating a new app.
Hi @cemsusal, is it happening when you create an app or when you add the repository? For the latter, an initial connection attempt is made by argocd-server
(and not the repo server). Is the argocd-server pod allowed to speak to Internet hosts or are you behind a HTTP proxy maybe?
This is happening while creating and app or adding a repository from bitbucket or gitlab. (Did not try github)
ARGOCD_GPG_ENABLED -> Is now set the false and issue still persists.
We are not behind a proxy. argocd-server can access to the internet.
This is a log from our repo-server:
time="2020-08-27T12:10:54Z" level=error msg="finished unary call with code Unknown" error="Get \"https://zzz:***@bitbucket.org/xxx/yyy.git/info/r efs?service=git-upload-pack\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" grpc.code=Unknown grpc.method=GenerateManifest grpc.req uest.deadline="2020-08-27T12:17:19Z" grpc.service=repository.RepoServerService grpc.start_time="2020-08-27T12:10:39Z" grpc.time_ms=15001.042 span.kind=server syst em=grpc
This is confirming that repo server is failing to fetch.
@jannfis this is the output of git commands:
argocd@argocd-repo-server-6ddb994d9-s49gf:/tmp/https:__xxx:yyy@bitbucket.org_project$ git remote -v
origin https://ttt:zzz@bitbucket.org/xxx/yyy.git (fetch)
origin https://ttt:zzz@bitbucket.org/xxx/yyy.git (push)
argocd@argocd-repo-server-6ddb994d9-s49gf:/tmp/https:__xxx:yyy@bitbucket.org_project$ git pull origin master
From https://bitbucket.org/xxx/yyy
* branch master -> FETCH_HEAD
Already up to date.
I have only one application on top of a fresh install of v.1.7.1. The issue still persists.
Installing Argo CD in another namespace did not resolve the issue for me.
it may depends on this https://github.com/argoproj/argo-cd/issues/3998 As new version of cert-manager has a huge manifest file (>1MB)
We're experiencing something similar in v1.7.6.
Our application controller started spewing this log this morning:
patch: {"status":{"conditions":[{"lastTransitionTime":"2020-12-03T12:33:40Z","message":"rpc error: code = Unknown desc = Get \"https://github.com/Bandwidth/some-repo/info/refs?service=git-upload-pack\": dial tcp 140.82.113.3:443: i/o timeout (Client.Timeout exceeded while awaiting headers)","type":"ComparisonError"}]}}
--
The reason being our network is saturated and so packets are being dropped, connections timing out, etc.
So while this kind of behavior is expected during a network issue, I'm concerned as to the impact its having on application syncs.
What bothers me, is that the repo server hasn't thrown a single error and everything it needs is still in its cache. So why is the app controller timing out on syncs and putting a bunch of apps in the Unknown
state, when it has everything it needs to do its job?
What is the point of the app controller hitting the Git repo with git-upload-pack
?
same problem with v1.8.3+0f9c684
Again, git timeout might be caused by slowness not related to ArgoCD. For instance the version control server you are using itself or tiers services related for authentication on it.
To troubleshoot that, ensure that realizing a git clone from your node with sames credentials as you configured won't last over 15 seconds.
What if the git timeout is definitely caused by slowness not (directly) related to ArgoCD ?
To be more precise, what if the slowness is caused by rate limiting? Then argocd will start retrying all the calls that are rate-limited for more than 15 seconds, thus hammering the git server with more and more requests, thereby indirectly causing the rate limiting to increase.
It would certainly be very useful if it were possible to change that timeout setting, in that situation.
(PS: Sorry if this comes across emotionally, I just had a bad day trying to figure out why our production applications suddenly weren't updating any more, and why azuregit was suddenly refusing to talk to argocd.)
thanks a lot man, i follow this and i fix my issue. before i googling 3 days to fixed it.
We're experiencing the same timeout issues after deploying an ApplicationSet which creates 9 applications.
level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2022-01-24T19:20:45Z\",\"message\":\"rpc error: code = Unknown desc = `helm dependency build` failed timeout after 1m30s\",\"type\":\"ComparisonError\"}]}}" application=app
ArgoCD version deployed: v2.2.2+03b17e0 ArgoCD ApplicationSet version: v0.3.0
Controller settings:
controller:
args:
# -- define the application controller `--status-processors`
statusProcessors: "60"
# -- define the application controller `--operation-processors`
operationProcessors: "40"
# -- define the application controller `--repo-server-timeout-seconds`
repoServerTimeoutSeconds: "360"
GitHub API rate limits are 5000 API calls per hour, or ~83 calls per minute. I'm not knowledgeable enough about ArgoCD engine to say that we're hitting such but limits but I think it could be very plausible that such limits are being reached as I'm able to create the Applications one by one!
UPDATE: I've also added appResyncPeriod
together with the above configs and re-deployed. Being patient was key now! 🔑 After about 10minutes, none of my 9 applications where in an Unkown
state anymore
# - define the application controller `--app-resync`
appResyncPeriod: "300"
UPDATE 2: At the seconds refresh, the repo-server
started throwing timeouts again!
level=error msg="finished unary call with code Unknown" error="Manifest generation error (cached): `helm dependency build` failed timeout after 1m30s" grpc.code=Unknown grpc.method=GenerateManifest grpc.request.deadline="2022-01-25T14:18:48Z" grpc.service=repository.RepoServerService grpc.start_time="2022-01-25T14:12:48Z" grpc.time_ms=330.102 span.kind=server system=grpc
@mnasruul what was the fix?
@mnasruul what was the fix?
follow this https://github.com/argoproj/argo-cd/issues/3864#issuecomment-681846012
Maybe it could help to somebody as well. We have ArgoCD on AKS cluster and we faced similar issue. According https://argo-cd.readthedocs.io/en/release-1.8/operator-manual/high_availability/ we have tried to set workaround:
--repo-server-timeout-seconds
for argocd-application-controller and issue disapiered for now.
Checklist:
argocd version
.Describe the bug
I've added different apps to my argo-cd but all of the fail to synch with the same error
To Reproduce
Here is the manifest from one of the demo projects which fails
Remarks
Not sure if this matters however...
rke
and contains 3 nodes.Expected behavior
Synchronization works fine and without errors
Version
Logs
argocd-server
**argocd-application-controller***
I suspect the issue is cause if the
i/o timeout
For convenience, I've attached the logs from
argocd-server
andargocd-application-controller