allenporter / flux-local

flux-local is a set of tools and libraries for managing a local flux gitops repository focused on validation steps to help improve quality of commits, PRs, and general local testing.
https://allenporter.github.io/flux-local/
Apache License 2.0
156 stars 22 forks source link

`in Replace: id matched 2 resources` error can still be reached #754

Closed jfroy closed 3 weeks ago

jfroy commented 4 months ago

Summary

My repo somehow still reproduces the in Replace: id matched 2 resources error originally reported in #707:

flux-local error:  Command 'flux build ks ingress-certificates --dry-run --kustomization-file /dev/stdin --path /pull/kubernetes/apps/ingress-certificates/ingress-certificates/app --namespace flux-system' failed with return code 1
✗ in Replace: id matched 2 resources

Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7901797ae7a0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/usr/local/lib/python3.12/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/usr/local/lib/python3.12/asyncio/unix_events.py", line 568, in close
    self._close(None)
  File "/usr/local/lib/python3.12/asyncio/unix_events.py", line 592, in _close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 795, in call_soon
    self._check_closed()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 541, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7901797ae7a0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/usr/local/lib/python3.12/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/usr/local/lib/python3.12/asyncio/unix_events.py", line 568, in close
    self._close(None)
  File "/usr/local/lib/python3.12/asyncio/unix_events.py", line 592, in _close
    self._loop.call_soon(self._call_connection_lost, exc)
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 795, in call_soon
    self._check_closed()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 541, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

This issue was originally filed about the RuntimeError: Event loop is closed error, which is a separate problem where flux does not exit cleanly after reporting an earlier error.

Reproduction

mkdir repro
cd repro
git clone https://github.com/jfroy/flatops -b flux-local-754-default default
git clone https://github.com/jfroy/flatops -b flux-local-754-pull pull
sudo chown -R 1001 *
docker run --rm --workdir $PWD -v $PWD/default:/default -v $PWD/pull:/pull ghcr.io/allenporter/flux-local:main diff kustomization --unified 6 --path /pull/kubernetes/flux --path-orig /default/kubernetes/flux --strip-attrs "helm.sh/chart,checksum/config,app.kubernetes.io/version,chart" --limit-bytes 10000 --all-namespaces --sources "home-kubernetes" --output-file diff.patch
jfroy commented 4 months ago

Ah I looked a bit closer at the logs and I think this is #707. So I think after that error happens flux-local just doesn't exit cleanly.

jfroy commented 4 months ago

Hum, but you checked in a fix for this (#734), so this should not be happening anymore?

allenporter commented 4 months ago

Yeah, i think your assessment looks right. Yeah, thought that was fixed. What version of flux-local are you running? (I saw you had a fork and your own images so just want to make sure its not a version that is behind)

jfroy commented 4 months ago

Yeah, i think your assessment looks right. Yeah, thought that was fixed. What version of flux-local are you running? (I saw you had a fork and your own images so just want to make sure its not a version that is behind)

I'll do another clean repro tomorrow using known-bad hashes from my repo (I've applied a workaround since reporting), but I am reasonably sure I reproduced with the above docker command, which pulls from your repo. But maybe the main label is not latest?

allenporter commented 4 months ago

Yep, i see you're right in your example above. Will have to take a closer look, thanks.

jfroy commented 4 months ago

I've updated the reproduction steps to use stable tags for this issue.

DapperDivers commented 3 weeks ago

Also seeing this issue, if you're looking for any info. happy to help if possible.

allenporter commented 3 weeks ago

One thing I did was try to build this with a simpler build command.

When I try this from head in that repo works:

$ mkdir repro
$ git clone https://github.com/jfroy/flatops
$ flux-local build ks ingress-certificates --path kubernetes/flux --sources "home-kubernetes" -n flux-system
...

However, something is wrong in that branch:

$ git checkout flux-local-754-default
$ flux-local build ks ingress-certificates --path kubernetes/flux --sources "home-kubernetes" -n flux-system
flux-local error:  Command 'flux build ks ingress-certificates --dry-run --kustomization-file /dev/stdin --path /workspaces/repro/flatops/kubernetes/apps/ingress-certificates/ingress-certificates/app --namespace flux-system' failed with return code 1
✗ in Replace: id matched 2 resources

When I run the raw file directly without substitutions it works ok:

$ flux build ks ingress-certificates --dry-run --kustomization-file kubernetes/apps/ingress-certificates/ingress-certificates/ks.yaml --path kubernetes/apps/ingress-certificates/ingress-certificates/app
allenporter commented 3 weeks ago

(Perhaps you added the workaround of setting a default value so thats why its now working)

allenporter commented 3 weeks ago

I added some logging in the middle of the build step and the intermediate fluxtomization passed into flux-build looks like this:

---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  labels:
    kustomize.toolkit.fluxcd.io/name: cluster-apps
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: ingress-certificates
  namespace: flux-system
  annotations:
    config.kubernetes.io/index: '26'
    internal.config.kubernetes.io/index: '26'
spec:
  commonMetadata:
    labels:
      app.kubernetes.io/name: ingress-certificates
  decryption:
    provider: sops
    secretRef:
      name: sops-age
  dependsOn:
  - name: cert-manager-issuers
  interval: 30m
  path: ./kubernetes/apps/ingress-certificates/ingress-certificates/app
  postBuild:
    substituteFrom:
    - kind: ConfigMap
      name: cluster-settings
    - kind: Secret
      name: cluster-secrets
    - kind: ConfigMap
      name: cluster-user-settings
      optional: true
    - kind: Secret
      name: cluster-user-secrets
      optional: true
    substitute:
      CLUSTER_POD_V6_CIDR: ..PLACEHOLDER..
      CLUSTER_SVC_V6_CIDR: ..PLACEHOLDER..
      CLUSTER_LBA_V6_CIDR: ..PLACEHOLDER..
      PUBLIC_DOMAIN0: ..PLACEHOLDER..
      PUBLIC_DOMAIN1: ..PLACEHOLDER..
      PUBLIC_DOMAIN2: ..PLACEHOLDER..
      OMNI_ACCOUNT_UUID: ..PLACEHOLDER..
      OMNI_AUTH0_CLIENT_ID: ..PLACEHOLDER..
      OMNI_AUTH0_DOMAIN: ..PLACEHOLDER..
      SECRET_ACME_EMAIL: ..PLACEHOLDER..
      SECRET_CLOUDFLARE_TUNNEL_ID: ..PLACEHOLDER..
  prune: true
  retryInterval: 1m
  sourceRef:
    kind: GitRepository
    name: home-kubernetes
  targetNamespace: ingress-certificates
  timeout: 5m
  wait: false
allenporter commented 3 weeks ago

I can't see what is wrong with that kustomization. Take that above Kustomization and stick it in /tmp/x and it can be reproduced like this:

$ flux build ks ingress-certificates --dry-run --kustomization-file /tmp/x --path kubernetes/apps/ingress-certificates/ingress-certificates/app 

When setting the placeholder values to this it fails:

      PUBLIC_DOMAIN0: ..PLACEHOLDER..
      PUBLIC_DOMAIN1: ..PLACEHOLDER..
      PUBLIC_DOMAIN2: ..PLACEHOLDER..

When setting the placeholder values like this it succeeds:

      PUBLIC_DOMAIN0: ..PLACEHOLDERA..
      PUBLIC_DOMAIN1: ..PLACEHOLDERB..
      PUBLIC_DOMAIN2: ..PLACEHOLDERC..
allenporter commented 3 weeks ago

I see, the primary issue is that the resources do not have a unique name. When the replacement happens for a value like this they all get set to the same value:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: "${PUBLIC_DOMAIN0/./-}-production"

The difference in the other issues is that it is using settings that can be found. In this case it is using values from secrets that are not found and replaced with the same placeholder.

allenporter commented 3 weeks ago

Merged a fix and tagged as release 6.0.01 -- thanks for the report!

DapperDivers commented 3 weeks ago

Works perfectly! Thank you!!