borchero / switchboard

Kubernetes Operator for Automatically Issuing DNS Records and TLS Certificates for Traefik Ingress Routes.
MIT License
155 stars 15 forks source link

Failed to start manager #57

Closed nrichardson-akasa closed 1 year ago

nrichardson-akasa commented 1 year ago

I keep getting the following error

2023-01-06T22:59:54.154Z    FATAL    failed to run manager    {"error": "failed to wait for ingressroute caches to sync: timed out waiting for cache to be synced"}                                                                                                                                                                                                    
main.main                                                                                                                                                                                                                                                                                                                                                              
 /workspace/cmd/main.go:73                                                                                                                                                                                                                                                                                                                                          
runtime.main                                                                                                                                                                                                                                                                                                                                                           
/usr/local/go/src/runtime/proc.go:250

I've verified all the RBAC has been successfully deployed I've verified the Traefik CRDs are deployed (and can successfully hit an IngressRoute deployed for the Dashboard) Verified cert-manager and external-dns are successfully deployed My values.yaml for the helm chart (version 0.5.5)

integrations:
  externalDNS:
    enabled: true
    targetService:
      name: external-dns
      namespace: external-dns
  certManger:
    enabled: true
    certificateTemplate:
      spec:
        issuerRef:
          kind: ClusterIssuer
          name: cluster-issuer

Not sure what I'm missing here

borchero commented 1 year ago

Hey @nrichardson-akasa! Sorry to hear that you have problems. Which version of Switchboard are you using?

nrichardson-akasa commented 1 year ago

I'm using v0.5.5 of the Helm chart

nrichardson-akasa commented 1 year ago

After some further debugging, I've narrowed it down to the externalDNS integration (if I disable externalDNS and enable certManger, it starts fine). It seems I'm supposed to point it to the Traefik service, not externalDNS service, but that still fails (with the following config)

integrations:
  externalDNS:
    enabled: true
    targetService:
      name: traefik
      namespace: traefik
nrichardson-akasa commented 1 year ago

Found the issue finally. I compiled the source code myself and added Development level logging for the controller-runtime package and found out it was because the external-dns CRDs were not installed. I was using the external-dns helm chart from https://github.com/kubernetes-sigs/external-dns while you are expecting it from Bitnami. The non-Bitnami chart does NOT have the option to install crds directly, therefore I was getting errors. I uninstalled the previous Helm chart and switched and now everything seems to be running! Might want to make a note of that for others and/or have the option for debug logging inside the controller-runtime.

borchero commented 1 year ago

Glad you figured it out! I think, we can add a hint in the README and add some logging. I'd appreciate if you could open a PR, otherwise, I'll do it in the next few weeks/months 😄