Closed jcogilvie closed 2 weeks ago
For a sample size of one, I had some better luck with this after setting grpc_web = true
on the provider config. I'll see if it recurs, but further validation would be helpful.
@jcogilvie you mentioned here that the issue is repeatable. Any chance you can share that config?
Well, there's a lot of terraform machinery around how it's actually configured, but I can give you a generalized lay of the terraform land, plus the app manifest that ends up being applied. I hope that's close enough.
Here's the (minimized) manifest:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: crm-pushback
namespace: argocd
spec:
destination:
namespace: crm-pushback
server: https://kubernetes.default.svc
project: crm-pushback
revisionHistoryLimit: 10
sources:
- chart: mycompany-api-service
helm:
releaseName: api
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
- chart: mycompany-consumer
helm:
releaseName: first-query-complete-receiver
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
- chart: mycompany-consumer
helm:
releaseName: first-status-poller
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
- chart: mycompany-consumer
helm:
releaseName: second-query-complete-receiver
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
- chart: mycompany-consumer
helm:
releaseName: second-status-poller
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
- chart: mycompany-cronjob
helm:
releaseName: syncqueries
values: |
enabled: true
otherValues: here
repoURL: https://mycompany.helm.repo/artifactory/default-helm/
targetRevision: ~> 2.2.0
syncPolicy:
automated: {}
retry:
backoff:
duration: 30s
factor: 2
maxDuration: 2m
limit: 5
It's built through this tf module:
resource "argocd_repository" "this" {
repo = data.github_repository.this.http_clone_url
project = argocd_project.this.metadata[0].name
lifecycle {
# these get populated upstream by argo
ignore_changes = [githubapp_id, githubapp_installation_id]
}
}
locals {
helm_repo_url = "https://mycompany.helm.repo/artifactory/default-helm/"
multiple_sources = [for source in var.services : {
repo_url = local.helm_repo_url
chart = source.source_chart
path = source.local_chart_path != null ? source.local_chart_path : ""
target_revision = source.local_chart_path != null ? var.target_infra_revision : source.source_chart_version
helm = {
release_name = source.name
values = source.helm_values
}
}]
sources = local.multiple_sources
sources_map = { for source in local.sources : source.helm.release_name => source }
}
resource "argocd_project" "this" {
metadata {
name = var.service_name
namespace = "argocd"
labels = {}
annotations = {}
}
spec {
description = var.description
source_namespaces = [var.namespace]
source_repos = [data.github_repository.this.html_url, local.helm_repo_url]
destination {
server = var.destination_cluster
namespace = var.namespace
}
role {
name = "owner"
description = "Owner access to ${var.service_name}. Note most operations should be done through terraform."
policies = [
...
]
groups = [
...
]
}
}
}
locals {
sync_policy = var.automatic_sync_enabled ? {
automated = {
allowEmpty = false
prune = var.sync_policy_enable_prune
selfHeal = var.sync_policy_enable_self_heal
}
} : {}
}
resource "argocd_application" "this" {
count = var.use_raw_manifest ? 0 : 1
wait = var.wait_for_sync
metadata {
name = var.service_name
namespace = "argocd"
labels = {} # var.tags -- tags fail validation because they contain '/'
}
spec {
project = argocd_project.this.metadata[0].name
destination {
server = var.destination_cluster
namespace = var.namespace
}
dynamic "source" {
for_each = local.sources_map
content {
repo_url = source.value.repo_url
path = source.value.path
chart = source.value.chart
target_revision = source.value.target_revision
helm {
release_name = source.value.helm.release_name
values = source.value.helm.values
}
}
}
sync_policy {
dynamic "automated" {
for_each = var.automatic_sync_enabled ? {
automated_sync_enabled = true
} : {}
content {
allow_empty = false
prune = var.sync_policy_enable_prune
self_heal = var.sync_policy_enable_self_heal
}
}
retry {
limit = var.sync_retry_limit
backoff {
duration = var.sync_retry_backoff_base_duration
max_duration = var.sync_retry_backoff_max_duration
factor = var.sync_retry_backoff_factor
}
}
}
}
}
Note that for this specific case, the creation doesn't get too_many_pings; but any kind of an update does (so, e.g., update the image in the sources).
Making it sufficiently bigger can cause too_many_pings on create as well. One of my apps tries to have like 30 sources, which was just too much for the provider (maybe for the CLI?) so I had to skip the provider and go right to a kubernetes_manifest
which was somewhat disappointing (though quick).
Bump this, im experiencing the same problem when deploying, some times is ramdomly. @jcogilvie how do you skip the provider using kubernetes_manifest? Thanks!
Bump, I'm also experiencing this on provider version 5.6.0
.
This happens for me on creation of argocd_cluster.
edit: Upgrading the provider to 6.0.3 does not seem to resolve the issue.
@amedinagar I used a kubernetes_manifest
resource with argo's declarative configuration.
There are a few gotchas: 1) make sure you add finalizers 2) you'll probably want a wait statement similar to this:
wait {
fields = {
"status.sync.status" = "Synced"
}
}
3) the kubernetes_manifest
provider has issues with the argo CRDs as of argo 2.8, when the schema changed to introduce a field with x-kubernetes-preserve-unknown-fields
on it. So, my CRDs are presently stuck on argo 2.6.7.
Will revisit once GRPCKeepAliveEnforcementMinimum is made configurable in the underlying argocd module. Related to https://github.com/argoproj/argo-cd/issues/15656
This is also happening to me when I'm using the ArgoCD cli to do the app sync.
@onematchfox looks like the upstream PR has been merged making the keepalive time configurable.
@onematchfox looks like the upstream PR has been merged making the keepalive time configurable.
Yeah, I see that. Although, we will need to wait for this to actually be released (at a glance PR was merged into main
so it will only be in the 2.9 release - feel free to correct me if I'm wrong) and then, it will take some consideration as to how we implement it here given that we need to support older versions as well.
Looks like 2.9 is released. What kind of consideration are we talking about here? How tightly is the client library coupled to the api?
Perusing the upstream PR, it looks like the server and the api client both expect an environment variable to be set (via common
).
So, if I'm understanding correctly, the new env var is something we can set in the client process and it'll simply be ignored in the event we need to use an older client lib version.
Given the implementation I actually wonder if setting it here in a new client would also fix the issue when running against an older server version as well.
I'm still seeing this frequently when connection to argo 2.9.2 all the time. What's the status on moving to the new library?
Any chance this gets looked at soon, @onematchfox? Is there anything we can do to help?
I am also experiencing this issue. But when adding a new cluster. I'm happy to provide any additional information to help resolve this. Some additional details: I'm using provider version 6.0.3 ArgoCD information { "Version": "v2.9.6+ba62a0a", "BuildDate": "2024-02-05T11:24:01Z", "GitCommit": "ba62a0a86d19f71a65ec2b510a39ea55497e1580", "GitTreeState": "clean", "GoVersion": "go1.20.13", "Compiler": "gc", "Platform": "linux/amd64", "KustomizeVersion": "(devel) unknown", "HelmVersion": "v3.14.0+g3fc9f4b", "KubectlVersion": "v0.24.17", "JsonnetVersion": "v0.20.0" }
I am also experiencing this issue. But when adding a new cluster. I'm happy to provide any additional information to help resolve this. Some additional details: I'm using provider version 6.0.3 ArgoCD information { "Version": "v2.9.6+ba62a0a", "BuildDate": "2024-02-05T11:24:01Z", "GitCommit": "ba62a0a86d19f71a65ec2b510a39ea55497e1580", "GitTreeState": "clean", "GoVersion": "go1.20.13", "Compiler": "gc", "Platform": "linux/amd64", "KustomizeVersion": "(devel) unknown", "HelmVersion": "v3.14.0+g3fc9f4b", "KubectlVersion": "v0.24.17", "JsonnetVersion": "v0.20.0" }
After switching to the official ArgoCD helm chart from the Bitnami chart and updating to version 2.10, this issue has gone away for me
Actually it seems like this only got released with 2.10. I wonder if it is just enough to run a 2.10 server, as @donovanrost seems to have done.
I have a strong suspicion that my case was somehow related to me having an entirely-too-large helm repo index file (~80 megs).
Hey folks, Sorry for the lack of response here. As of v6.1.0
the provider is now import v.2.9.9
of argoprg/argocd
. I do suspect that this issue is mostly server side so you may need to update you Argo instance to v2.10
as @donovanrost suggested. But, if that doesn't work then we're certainly open to PRs to upgrading the deps in this provider to v2.10
since the changes to the client side code didn't land in 2.9
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Terraform Version, ArgoCD Provider Version and ArgoCD Version
Affected Resource(s)
Terraform Configuration Files
A generic multi-source application w/6 sources; all of them helm; all of them with
values
inline on theirsource
objectOutput
Steps to Reproduce
terraform apply
Expected Behavior
Update is applied
Actual Behavior
Failed with above error
Important Factoids
public argo endpoint to an EKS cluster
References
124
Community Note