Open tropnikovvl opened 5 months ago
I wonder if perhaps this is specific to a certain version of helm
. This seems similar to https://github.com/helm/helm/issues/7600 where the helm command may not be resiliant to multiple instances running at once sometimes.
I have several jobs running in parallel to each other (via Github Actions matrixes). And most likely they are executed on different hosts.
Can you try a newer version of helm and see if that helps?
Hello! Thanks for the update!
I'll keep an eye on it, the fact is that on the previous version I encountered problems on average 1 time out of 10-15 launches. If anything happens I will write here
@allenporter Unfortunately the problem persists
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file
WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 2381 is closed
Traceback (most recent call last):
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/flux_local.py", line 61, in main
asyncio.run(action.run(**vars(args)))
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/diff.py", line 414, in run
await asyncio.gather(
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 309, in inflate
await asyncio.gather(*tasks)
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 237, in inflate_release
await visitor.func(pathlib.Path(""), release, cmd)
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 197, in call_async
objects = await cmd.objects()
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in objects
return [doc async for doc in self._docs(target_namespace=target_namespace)]
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in <listcomp>
return [doc async for doc in self._docs(target_namespace=target_namespace)]
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 120, in _docs
out = await self.run()
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 114, in run
return await run_piped(self._cmds)
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 122, in run_piped
result = await _run_piped_with_sem(cmds)
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 110, in _run_piped_with_sem
out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 100, in run
raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file
flux-local error: Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file
Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7fd7689a2a70>
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 126, in __del__
self.close()
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 104, in close
proto.pipe.close()
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 746, in close
self.write_eof()
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 732, in write_eof
self._loop.call_soon(self._call_connection_lost, None)
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
self._check_closed()
File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
Hi, what version of helm are you using? Thanks!
Hello.
I have the latest version of Helm, but I don’t really understand why it’s here. The diff is executed in the github runner and I do not pre-install anything into it. Just using this code
name: "Flux Diff"
on:
push:
branches: ["renovate/*"]
concurrency:
group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
cancel-in-progress: true
jobs:
diffs:
name: Compute diffs
runs-on: ubuntu-22.04
steps:
- name: Setup Flux CLI
uses: fluxcd/flux2/action@v2.3.0
- uses: allenporter/flux-local/action/diff@5.4.0
id: diff
with:
live-branch: develop
path: clusters/path
resource: helmrelease
debug: true
- name: PR Comments
uses: mshick/add-pr-comment@v2
if: ${{ steps.diff.outputs.diff != '' }}
with:
message-id: ${{ github.ref }}/flux-diff
message-failure: Unable to post HelmRelease diff
message: |
`````diff
${{ steps.diff.outputs.diff }}
What's the "concurrency" about? does that run in parallel on the same filesystem .
Basically we can't have multiple processes clobbering the local filesystem. Flux build creates temp files that may be getting messed up if two run at once in the same directory.
To do multiple runs at once they may need their own file paths checked out.
All launches are performed in parallel, but they work in individual containers of GitHub runners and should not affect each other.
That's why I'm confused when I see duplicate logs
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file
OK this still seems consistent with helms cache not working with multiple instances in parallel. People say the solution is to use a separate temporary directory for every instance. The reason for a shared repository cache is to avoid needing to pull the same repositories multiple times specially when running diffs (everything is loaded twice). We could workaround with a lock held on each repo as a hack but not a fan necessarily of that. Could also add more controls to tune helm concurrency.
I'd prefer if helm cli was fixed to be more resilient to running in parallel of course....
Need to think about this more.
Hello!
This error sometimes appears for an unknown reason about 1 time per 10 starts. I'm using version 5.2.0, but I observed this on version 5.1.0 as well.