allenporter / flux-local

flux-local is a set of tools and libraries for managing a local flux gitops repository focused on validation steps to help improve quality of commits, PRs, and general local testing.
https://allenporter.github.io/flux-local/
Apache License 2.0
155 stars 22 forks source link

Error with diff in github actions #666

Open tropnikovvl opened 6 months ago

tropnikovvl commented 6 months ago

Hello!

This error sometimes appears for an unknown reason about 1 time per 10 starts. I'm using version 5.2.0, but I observed this on version 5.1.0 as well.

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmpeo1pvo2y/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.command:Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 118, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 112, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 120, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file
allenporter commented 6 months ago

I wonder if perhaps this is specific to a certain version of helm. This seems similar to https://github.com/helm/helm/issues/7600 where the helm command may not be resiliant to multiple instances running at once sometimes.

tropnikovvl commented 6 months ago

I have several jobs running in parallel to each other (via Github Actions matrixes). And most likely they are executed on different hosts.

allenporter commented 4 months ago

Can you try a newer version of helm and see if that helps?

tropnikovvl commented 4 months ago

Hello! Thanks for the update!

I'll keep an eye on it, the fact is that on the previous version I encountered problems on average 1 time out of 10-15 launches. If anything happens I will write here

tropnikovvl commented 4 months ago

@allenporter Unfortunately the problem persists

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 2381 is closed
Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 120, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 114, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 122, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7fd7689a2a70>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 746, in close
    self.write_eof()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 732, in write_eof
    self._loop.call_soon(self._call_connection_lost, None)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
allenporter commented 4 months ago

Hi, what version of helm are you using? Thanks!

tropnikovvl commented 4 months ago

Hello.

I have the latest version of Helm, but I don’t really understand why it’s here. The diff is executed in the github runner and I do not pre-install anything into it. Just using this code

name: "Flux Diff"

on:
  push:
    branches: ["renovate/*"]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
  cancel-in-progress: true

jobs:
  diffs:
    name: Compute diffs
    runs-on: ubuntu-22.04
    steps:
      - name: Setup Flux CLI
        uses: fluxcd/flux2/action@v2.3.0

      - uses: allenporter/flux-local/action/diff@5.4.0
        id: diff
        with:
          live-branch: develop
          path: clusters/path
          resource: helmrelease
          debug: true

      - name: PR Comments
        uses: mshick/add-pr-comment@v2
        if: ${{ steps.diff.outputs.diff != '' }}
        with:
          message-id: ${{ github.ref }}/flux-diff
          message-failure: Unable to post HelmRelease diff
          message: |
            `````diff
            ${{ steps.diff.outputs.diff }}
allenporter commented 4 months ago

What's the "concurrency" about? does that run in parallel on the same filesystem .

Basically we can't have multiple processes clobbering the local filesystem. Flux build creates temp files that may be getting messed up if two run at once in the same directory.

To do multiple runs at once they may need their own file paths checked out.

tropnikovvl commented 4 months ago

All launches are performed in parallel, but they work in individual containers of GitHub runners and should not affect each other. Screenshot 2024-07-07 at 14 04 47 Screenshot 2024-07-07 at 14 05 05

That's why I'm confused when I see duplicate logs

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file
allenporter commented 4 months ago

OK this still seems consistent with helms cache not working with multiple instances in parallel. People say the solution is to use a separate temporary directory for every instance. The reason for a shared repository cache is to avoid needing to pull the same repositories multiple times specially when running diffs (everything is loaded twice). We could workaround with a lock held on each repo as a hack but not a fan necessarily of that. Could also add more controls to tune helm concurrency.

I'd prefer if helm cli was fixed to be more resilient to running in parallel of course....

Need to think about this more.