repo-cache-expiration is ignored for helm repositories

De1taE1even commented 4 years ago

I've asked about this in slack several times over several days and no one has an answer. It doesn't help that I literally can't find one example of this flag actually getting used on the entirety of the internet.

Describe the bug

I recently transitioned all my argocd apps to point at a private helm repository we host, instead of pointing the applications back at our gitlab instance. This is because we're getting customers that need a private installation of our product, not managed by us. This has worked great for the most part, I really like sync'ing to a helm repo versus gitlab.

However, changes to the helm chart aren't picked up by argo because argo isn't re-pulling the helm chart from our chart repo. The only way I can get argo to re-pull is to issue a manual hard-refresh. But I can't get this to happen in any automated fashion. I've researched this extensively, and the only thing I could find is this repo-cache-expiration flag that you can set in the repo server command that will expire argo's redis cache and force it to re-pull the chart from source. I set this flag to 1m for testing. At first I didn't think I set it correctly, but then I saw that it was working, for the couple of apps I still had argo pointing to gitlab. But for the helm-based applications, this flag seems to be completely ignored. Nothing I've been able to do will convince argo to re-pull from our helm chart repo.

To Reproduce

Deploy an app in argocd that syncs to a helm chart hosted in an external repository
Make and commit a change to the chart, in-place (don't re-version it)
You'll notice that argocd won't pick up the change until you hard-refresh

Expected behavior

With the repo-cache-expiration flag set to 1m I'd expect a normal refresh of the app to re-pull the chart from source, but it doesn't.

Screenshots

Here's a snippet of my repo server deployment, for reference on how I'm setting the flag:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argocd-repo-server
  namespace: argocd
spec:
  template:
    spec:
      containers:
      - command:
        - uid_entrypoint.sh
        - argocd-repo-server
        - --repo-cache-expiration
        - 1m
        - --redis
        - argocd-redis:6379

Version

argocd: v1.5.4+36bade7
  BuildDate: 2020-05-05T19:02:56Z
  GitCommit: 36bade7a2d7b69d1c0b0c4d41191f792a847d61c
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v1.6.1+159674e
  BuildDate: 2020-06-19T00:41:05Z
  GitCommit: 159674ee844a378fb98fe297006bf7b83a6e32d2
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.14.0

darshanime commented 4 years ago

One way to get hard refreshes is to periodically add the argocd.argoproj.io/refresh=hard annotation on the application.

@alexmt do you think it would be nice to add a --app-hard-resync option (similar to --app-resync) that would perform a hard refresh after given duration? Default can be never.

arielhernandezmusa commented 4 years ago

@darshanime with argocd.argoproj.io/refresh=hard using helm repo not works

De1taE1even commented 4 years ago

@darshanime Yes that is one way to do it, but that defeats the purpose of CI and automated tools if you have to manually inject a k8s label. Sure I could automate the injection, but that's a really hacky solution. This is a pretty fundamental interaction that Argo should support. Whether it be by hard refresh, or some other option, right now ArgoCD simply doesn't support helm chart updates in place, since it never goes back out to pull the latest version of the chart.

arielhernandezmusa commented 4 years ago

What about sync a chart that only changes the app version!

De1taE1even commented 4 years ago

@arielhernandezmusa If I'm understanding you correctly, that wouldn't help me out. Nothing you can change on the helm chart would help, because Argo never re-pulls the chart itself, which is the problem at its core. Something has to tell Argo to, instead of using internal cache, go out and re-pull the helm chart from its source. Right now the only way to do that is a manual 'hard refresh'.

arielhernandezmusa commented 4 years ago

Okay, maybe what I say can be an improvement

r0bj commented 3 years ago

This issue is also related to more cases:

kustomize with remote base (e.g. the manifests of a git url tag)
custom plugin which fetches data from external sources (e.g.: vault secrets: https://github.com/IBM/argocd-vault-plugin)

It would be nice to have a way to cover those use cases without manual intervention (hard-refresh). I think this PR would be helpful here: https://github.com/argoproj/argo-cd/pull/4678

djvs commented 3 years ago

Related, manual syncs should probably always do a hard refresh. There's maybe a five minute lag with default application setup even if you manually go into the UI and hit Sync.

alwaysastudent commented 3 years ago

This issue is also related to more cases:

kustomize with remote base (e.g. the manifests of a git url tag)

custom plugin which fetches data from external sources (e.g.: vault secrets: https://github.com/IBM/argocd-vault-plugin)

It would be nice to have a way to cover those use cases without manual intervention (hard-refresh). I think this PR would be helpful here: #4678

I think we need a way to selectively and periodically hard refresh applications that have some sort of annotation in there.

De1taE1even commented 3 years ago

This issue is also related to more cases:

kustomize with remote base (e.g. the manifests of a git url tag)

custom plugin which fetches data from external sources (e.g.: vault secrets: https://github.com/IBM/argocd-vault-plugin)

It would be nice to have a way to cover those use cases without manual intervention (hard-refresh). I think this PR would be helpful here: #4678

I think we need a way to selectively and periodically hard refresh applications that have some sort of annotation in there.

Please, please this, or something similar.

rjackoby-r7 commented 2 years ago

+1

irishandyb commented 2 years ago

+1

This is a pretty big issue for me. My development workflow involves re-releasing a Helm chart with the same version but containing updates to the templates within the chart.

I am using the Application custom resource referencing a Helm chart from a Helm repository and not a git repository.

When I build and publish an update to the Helm chart i.e., adding a new resource, syncing the application has no effect.. the original version of the chart is cached so the updates are not picked up.

I have tried setting --default-cache-expiration 1m and --repo-cache-expiration 1m on the repo server and also using the argocd.argoproj.io/refresh: hard annotation on the Application however neither approach has worked.

This is a complete blocker for me using Argo CD. The ability to hard sync a Helm application is key; or at least have Helm charts expire from the repo cache.

Is there any plan for this bug? Otherwise any advice would be greatly appreciated.

pierluigilenoci commented 2 years ago

@irishandyb I am very interested in this problem being solved and obviously everyone is free to use things as they prefer.

IMHO I believe that changing the templates of a chart without changing the version number of the chart itself is against the paradigm of versioning itself. Even if this caching problem gets solved I doubt it solves your problem because solving such anti-paradigm stuff doesn't make much sense for the community. It basically makes the cache itself useless.

irishandyb commented 2 years ago

Hi @pierluigilenoci, thank you very much for your message. I am very grateful for your feedback.

I completely understand the importance of correct versioning for releases. However, during the development stage I often make quick and small changes to the chart that I wish to test without the need for version or release management. For example, if I add a new Service to a chart and wish to test it, I do not want to need to bump up the version in the Chart.yaml and also the referenced version in the Application resource. To quickly test my change it is easier to simply re-build and re-deploy (or sync) small changes. Again if I was to subsequently find a bug in the Service, such as pointing to the wrong port in the Pod, then another quick change to the template, re-build and sync.

In short, in the development cycle, I am avoiding the need to continuously update the Chart.yaml and Application resource.

Is this incorrect? Should I look at automating the bumping of the versions or is there another common pattern?

pierluigilenoci commented 2 years ago

@irishandyb I understand the particular need but I find it to be a bit overdoing. There are leaner ways to test if a chart is valid and does what it takes than going through ArgoCD. For example by generating the manifests locally or by manually installing them on a cluster.

However, there are tools to automate versioning. For example https://github.com/sstarcher/helm-release And a precise strategy. https://semver.org/#spec-item-9

De1taE1even commented 2 years ago

@irishandyb @pierluigilenoci You both bring up valid points, I have a similar dev cycle that re-uses tags, knowing that isn't the "best practice standard". All this aside though, I think we need to step back and look at what is actually being asked which would be beneficial in more than one situation, and that is to either: (1) the ability to hard refresh automatically, periodically or much less desirably: (2) the ability to disable cache entirely

The re-use of tags is just one use-case but more generically, being able to re-check the helm chart repository for changes periodically is something that argocd should have the ability to do, like what it currently does with git repositories.

On top of all of this, there's the point to be made that this is basically a bug, not a feature request, since argocd currently has a cache expiration parameter that straight-up doesn't work at all. Simply honoring this param would solve these issues. You could set the cache expiration to whatever you want, which, if working, would force argocd to go back to the helm chart repo source to check truth.

irishandyb commented 2 years ago

@pierluigilenoci - thank you very much for your input. I will research further and consider changing my process.

That being said, I do agree with @De1taE1even and it would be nice to see some options around caching of Helm charts.

pierluigilenoci commented 2 years ago

@De1taE1even @irishandyb IMHO the discussion here is overlapping different themes:

versioning of the charts
the invalidation of ArgoCD caches
the caching bug that generated this issue

Regarding the versioning of the charts, invalidating the cache is not necessary because Semantic Versioning supports pre-releases and therefore each test can have its own clear tag (and in this way it is also easier to log changes). Let's say for example that I have the chart version 1.2.3 and I want to make some changes. So I can name every single test version as 1.2.4-alpha-1, 1.2.4-alpha-2 etc (or any combination of dots, alphanumerics, and hyphens). This can also be automated with tools.

As for the cache, having the option to invalidate it with some form of trigger or to disable it entirely is certainly a desirable thing. But it would also be enough just to fix the bug that makes it unusable.

I think it might make sense to separate the feature request to improve cache management from this bug fix request to get things clearer.

De1taE1even commented 2 years ago

@pierluigilenoci Separating these topics seems fine to me. Honestly, correct me if I'm wrong, but if the cache invalidation worked properly, and I set it to 24 hours, then the first sync check after the cache was invalidated would be forced to pull from source, and that'd be good enough for my use case.

pierluigilenoci commented 2 years ago

@De1taE1even theoretically, when a cache is invalidated, the software should be forced to re-download the content from the original source.

How to use this mechanism then I'll leave to you. 😜

krzwiatrzyk commented 1 year ago

Another case when this issue occur:

All of my apps (Application CR) are using remote values (via URL) to supply it to helm. Any change in those values won't occur in Sync or Refresh. Only Hard Refresh can help. Thus, if changes will be made in those values, I am not aware of them until manually clicking Hard Refresh.

alexef commented 1 year ago

@De1taE1even I see https://github.com/argoproj/argo-cd/pull/8928 has been merged in ~April~ August, did that fix the bug mentioned in this issue or only offered a way to mitigate it?

Do I understand correctly that the bug was not repo-cache-expiration for helm, but the fact that the Application Status was cached?

LE: I checked the code and it seems it behaves like this:

repo-cache-expiration controls the redis manifest cache expiration (for both Git or Helm sources)
when the cache get is a miss, manifests are regenerated; in case of git, we check the target version again out; in case of helm, we unpack the already downloaded helm chart tar.gz and hydrate it
when "hard refresh" is used (ends up as noCache bool in this part of code), we explicitely remove the tar.gz from filesystem, which in turn triggers a new download from Helm registry.

I don't think noCache should be overloaded to control both the redis cache and the file system cache. @alexmt WDYT?

De1taE1even commented 1 year ago

@alexef You are correct, my only concern is not being able to properly expire helm chart cache. The new flag should solve this for me, and I like that it can be set to a different, less frequent interval than the normal refresh. Thank you for pointing it out, I missed it. I haven't tested it yet, but assuming it does what it's supposed to do, this is great solution to this ticket, from my perspective.

De1taE1even commented 1 year ago

@alexef Well, I don't know why it isn't working, but that flag did nothing. I confirmed 100% that the flag is applied, and I set it to 3600 to hard refresh once per hour. It's been several hours, and no automatic hard refresh was performed. I manually performed a hard refresh and immediately saw the change, so this flag isn't working properly. I even cycled every single pod as part of the argocd installation, just to make sure all pods had the most updated config. I also verified that I'm on a newer version than required for this flag to be supported.

alexef commented 1 year ago

@De1taE1even thank you for the updates. I did the same (set the timeout.hard.reconciliation to 120s in the argocd-cm ConfigMap and restarted the application controller StatefulSet). For me the flag worked, as I saw the reposerver pulled again the tar.gz chart from registry, every 2m. I'm on the 2.5/latest branch. Not sure what's going on with your setup.

De1taE1even commented 1 year ago

@alexef Is there some documentation I missed about adding timeout.hard.reconciliation to the argocd-cm configmap? The ticket that worked this feature shows it as an application controller run flag, as does this documentation, so that's how I implemented it. I can't find a reference to setting it like you did.

simone-dandreta commented 1 year ago

I am hitting a very similar case with my helm chart. My helm chart was packed with a wrong image tag and I discovered that once I deployed to Argo. So I re packed the same chart with the right image and deployed to Argo, but Argo didn't pull it from the remote repo, so I had to use the hard refresh mechanism to see the new image. I am now having a problem though - sometimes when Argo applies other manifests not related to this helm chart, it shows my application out of sync and pulls the chart with the older image.. and I am not sure where it takes that from. I had to do another hard refresh to make it working, but this happened already 2 times in the past 15 hours.. I am afraid it will happen again. Is there any other place where Argo stores the helm chart with the wrong image tag? Or what I can do? Thanks

lknite commented 6 months ago

Running into this today again, this time it was a bitnami helm chart which I submitted a pull request for, I pulled the helm chart and my fix was missing. After some communication the helm chart was updated (with the same version number) and it worked, till I redeployed. Afterwards there is a caching issue. Click 'hard refresh' hasn't helped.

Running into this issue also in my development environment where hard refresh works but afterwards when something restarts its trying to load a different version than the latest and greatest, maybe its cause I'm using an oci helm chart store. Wish I could just clear out the helm chart cache somehow.

I think this issue may not be getting the attention it needs because of philosophical issues. Gitops its awesome for developers and its perfectly reasonable to at some point to want to always pull the latest helm chart until there are scripts in place to update all the micro services helm charts and versions, along with an overall helm chart with its own version, rather than using 0.0.x which seems to get confused. Maybe a series of options could be allowed which are more developer related, something like:

--developer-option-disable-helm-chart-cache

Or per applicationset like so:

      syncPolicy:
        syncOptions:
        - CreateNamespace=true
        - DisableHelmChartCache=true

lknite commented 6 months ago

Submitted a feature request https://github.com/argoproj/argo-cd/issues/18000

argoproj / argo-cd

repo-cache-expiration is ignored for helm repositories #4002