akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.39k stars 114 forks source link

Helm dependency update gets killed during helm chart promotion #2123

Open shabbirsaifee92 opened 3 weeks ago

shabbirsaifee92 commented 3 weeks ago

Checklist

Description

when trying to update the helm umbrella chart, the helm dependency update process is getting killed before it can finish. So very rarely it goes through but most of the time it gets killed.

Screenshots

Screenshot 2024-06-05 at 3 45 54 PM Screenshot 2024-06-05 at 3 26 46 PM

If you keep trying to promote eventually it builds and creates the PR

Screenshot 2024-06-05 at 3 43 00 PM

Steps to Reproduce

  1. Create warehouse that uses an image and helm chart
    apiVersion: kargo.akuity.io/v1alpha1
    kind: Warehouse
    metadata:
    name: redis
    namespace: redis
    spec:
    subscriptions:
    - image:
      repoURL: redis
    - chart:
      repoURL: https://charts.bitnami.com/bitnami
      name: redis
  2. Create stage that uses gitRepoUpdate promotions mechanism with helm
    apiVersion: kargo.akuity.io/v1alpha1
    kind: Stage
    metadata:
    name: pre-prod
    namespace: redis
    spec:
    subscriptions:
    warehouse: redis
    promotionMechanisms:
    gitRepoUpdates:
      - repoURL:  <gitops-repo>
        insecureSkipTLSVerify: false
        writeBranch: main
        pullRequest: {}
        helm:
          charts:
          - repository: https://charts.bitnami.com/bitnami
            name: redis
            chartPath: helm/redis/environments/pre-prod
          images:
          - image: redis
            key: 'deployment.image.tag'
            value: Tag
            valuesFilePath: helm/redis/environments/pre-prod/values.yaml
  3. Create the umbrella chart at helm/redis/environments/pre-prod in the gitops repo with Chart.yaml and values.yaml
    
    # Chart.yaml
    apiVersion: v2
    name: redis
    description: A Helm chart for deploying redis
    type: application
    version: 1.0.0
    dependencies:
    - name: redis
    repository: https://charts.bitnami.com/bitnami
    version: 19.0.0

values.yaml

deployment: replicas: 1 image: name: redis tag: '1.0.0'

# Version

Paste the output from kargo version here.

0.6.0

# Logs

time="2024-06-05T19:26:34Z" level=info msg="began promotion" freight=d54857f5d17c5e76261d4ff116afd003e1d9064a namespace=redis promotion=staging.01hzmxw2ttka1hxphvpn42qzwg.d54857f stage=staging time="2024-06-05T19:26:38Z" level=error msg="error executing Promotion: error executing Git-based promotion mechanisms: error executing Helm promotion mechanism: error updating dependencies for chart \"helm/redis/environments/pre-prod\": :error running helm dependency update for chart at \"/tmp/repo-155493290/repo/helm/redis/environments/pre-prod\": error executing cmd [/usr/local/bin/helm dependency update /tmp/repo-155493290/repo/helm/redis/environments/pre-prod]: Getting updates for unmanaged Helm repositories...\n" freight=d54857f5d17c5e76261d4ff116afd003e1d9064a namespace=redis promotion=staging.01hzmxw2ttka1hxphvpn42qzwg.d54857f stage=staging time="2024-06-05T19:26:38Z" level=info msg="promotion Errored" freight=d54857f5d17c5e76261d4ff116afd003e1d9064a namespace=redis promotion=staging.01hzmxw2ttka1hxphvpn42qzwg.d54857f stage=staging

Paste any relevant application logs here.

krancour commented 3 weeks ago

@shabbirsaifee92 is the subscription to the Redis image repo and the step to update a values.yaml with the Regis image's tag actually necessary to reproduce this?

I ask, because I don't believe that should have any bearing on the step that updates the Chart.yaml... but stranger things have happened.

I figured some clarity on this might help get to the bottom of this quicker.

shabbirsaifee92 commented 3 weeks ago

@shabbirsaifee92 is the subscription to the Redis image repo and the step to update a values.yaml with the Regis image's tag actually necessary to reproduce this?

I ask, because I don't believe that should have any bearing on the step that updates the Chart.yaml... but stranger things have happened.

I figured some clarity on this might help get to the bottom of this quicker.

@shabbirsaifee92 is the subscription to the Redis image repo and the step to update a values.yaml with the Regis image's tag actually necessary to reproduce this?

I ask, because I don't believe that should have any bearing on the step that updates the Chart.yaml... but stranger things have happened.

I figured some clarity on this might help get to the bottom of this quicker.

No I don't believe they are necessary, I just wanted to provide the information about the setup I have

krancour commented 3 weeks ago

Thanks for the info @shabbirsaifee92

hiddeco commented 3 weeks ago

Does this happen for any umbrella chart or those that rely on specific upstream Helm repositories and/or charts? In addition, does the problem potentially go away when you change the dependency to make use of an OCI Helm chart?

shabbirsaifee92 commented 3 weeks ago

Does this happen for any umbrella chart or those that rely on specific upstream Helm repositories and/or charts? In addition, does the problem potentially go away when you change the dependency to make use of an OCI Helm chart?

I'll try it with oci registry. The setup we have always uses an umbrella chart for the real helm chart hosted on a repository.

shabbirsaifee92 commented 3 weeks ago

Does this happen for any umbrella chart or those that rely on specific upstream Helm repositories and/or charts? In addition, does the problem potentially go away when you change the dependency to make use of an OCI Helm chart?

I'll try it with oci registry. The setup we have always uses an umbrella chart for the real helm chart hosted on a repository.

Hey thanks! Using OCI registry everywhere in the chart, warehouse and stages did the trick. Promotions are no longer erroring out!

shabbirsaifee92 commented 3 weeks ago

Do we know the reason it happens?

hiddeco commented 3 weeks ago

The OCI charts are lighter, in terms of both memory and disk usage.

What the precise culprit is in your scenario, I can't tell based on the information you shared. But I can imagine that it has something to do with the size of the repository indexes, and it either being temporarily stored on disk (which potentially is an in-memory tmpfs), in combination with the parsing of the index YAML also consuming quite a bit of memory (for which I added a --json to Helm, but this has not been widely adopted).

Does your controller stay alive at the point it fails? Or did it get OOMKilled by any chance?

shabbirsaifee92 commented 3 weeks ago

The controller is not getting OOMKilled, I initially thought the same but since container/pod is fine not sure what is killing the helm dependency update process..

hiddeco commented 2 weeks ago

Going to try to reproduce this in this case, to address the issue and/or potentially see if Kargo itself can be more upfront about the precise issue it is running into.

hiddeco commented 2 weeks ago

@shabbirsaifee92 can you share more details about e.g. the number of Bitnami charts you have in your umbrella chart? I have thus far been unable to reproduce it, even with nearly a dozen different (Bitnami) chart dependencies.

shabbirsaifee92 commented 2 weeks ago

@hiddeco

# Chart.yaml
apiVersion: v2
name: redis
description: A Helm chart for deploying redis
type: application
version: 1.0.0
dependencies:
  - name: redis
    repository: https://charts.bitnami.com/bitnami
    version: 19.0.0

---
# values.yaml
deployment:
  replicas: 1
  image:
    name: redis
    tag: '1.0.0'

this was literally the Chart and Values file I am using. Not sure why its failing in my case though