emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.34k stars 681 forks source link

Error: conversion webhook for getambassador.io/v3alpha1, Kind=Module failed #5007

Open JKrajnik opened 1 year ago

JKrajnik commented 1 year ago

Describe the bug conversion webhook for getambassador.io/v3alpha1 is not working. It is still possible to use /v2 but /v3alpha1 throws error: Post [https://emissary-apiext.emissary-system.svc:443/webhooks/crd-convert?timeout=30s]: Service Unavailable, even when emissary-apiext pods are running without any error.

To Reproduce Steps to reproduce the behavior:

  1. follow installation steps to install via helm https://www.getambassador.io/docs/emissary/latest/topics/install/helm

Expected behavior Installing emissary-ingress should work when following documentation

Versions (please complete the following information):

Additional context Log messages from emissary-ingress 2023-04-28 04:24:44 diagd 3.6.0 [P15TMainThread] INFO: AMBASSADOR_FAST_RECONFIGURE enabled, initializing cache 2023-04-28 04:24:44 diagd 3.6.0 [P15TMainThread] INFO: WILL NOT update Mapping status 2023-04-28 04:24:44 diagd 3.6.0 [P15TMainThread] INFO: thread count 9, listening on 127.0.0.1:8004 time="2023-04-28 04:24:44.2300" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2023-04-28 04:24:45.2312" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2023-04-28 04:24:46.2327" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2023-04-28 04:24:47.2335" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2023-04-28 04:24:48.2347" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh time="2023-04-28 04:24:49.2358" level=error msg="Post \"http://localhost:8004/_internal/v0/watt?url=http%3A%2F%2Flocalhost%3A9696%2Fsnapshot\": dial tcp 127.0.0.1:8004: connect: connection refused" func=github.com/emissary-ingress/emissary/v3/cmd/entrypoint.notifyWebhookUrl file="/go/cmd/entrypoint/notify.go:124" CMD=entrypoint PID=1 THREAD=/watcher/notifyCh 2023-04-28 04:24:49 diagd 3.6.0 [P15TMainThread] WARNING: Scout: could not post report: HTTPSConnectionPool(host='metriton.datawire.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3895094610>: Failed to establish a new connection: [Errno -3] Try again')) 2023-04-28 04:24:49 diagd 3.6.0 [P15TMainThread] INFO: Ambassador 3.6.0 booted [2023-04-28 04:24:49 +0000] [15] [INFO] Starting gunicorn 20.1.0 [2023-04-28 04:24:49 +0000] [15] [INFO] Listening at: http://127.0.0.1:8004 (15) [2023-04-28 04:24:49 +0000] [15] [INFO] Using worker: gthread [2023-04-28 04:24:49 +0000] [17] [INFO] Booting worker with pid: 17 2023-04-28 04:24:49 diagd 3.6.0 [P17TAEW] INFO: starting Scout checker and timer logger 2023-04-28 04:24:49 diagd 3.6.0 [P17TAEW] INFO: starting event watcher 2023-04-28 04:24:50 diagd 3.6.0 [P17TAEW] ERROR: Secret fallback-self-signed-cert.emissary unknown 2023-04-28 04:24:50 diagd 3.6.0 [P17TAEW] INFO: EnvoyConfig: Generating V3 2023-04-28 04:24:50 diagd 3.6.0 [P17TAEW] INFO: V3Ready: ==== listen on 127.0.0.1:8006

cindymullins-dw commented 1 year ago

Hi @JKrajnik, we typically see this when an old CRD is hanging around, but in 3.4 we added back in support for v1 CRDs to assist migration & so this kind of thing wouldn’t happen. Are you upgrading from 3.4 to 3.6? Judging from the logs you may have a network issue - is there a firewall blocking perhaps? Can you try adding the port to your cluster’s firewall?

JKrajnik commented 1 year ago

Hello @cindymullins-dw, this was not the upgrade. It is possible that issue is on network side because system is behind proxy server. I have configured the proxy variables and everything else seems to work(updating helm repo, downloading images). Is there something else that needs to be configured? Its also strange that other resources worked. I tested for example creating a listener and conversion worked. So the issue is only with the module resource.

apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: example-listener
spec:
  port: 8080
  protocol: TCP
  securityModel: XFP
  hostBinding:
    namespace:
      from: ALL

As i wanted to be sure that its not because of the old CRD i tested installation on completely different cluster (1.24.6) which is also behind the proxy. And the behavior was strange. Installation worked so i tried to delete custom resources and recreate them. Creating Listener worked without any issues, but when creating module resource i again received conversion webhook errors from time to time. The strange thing is that module resource was sometimes created. After removing the proxy from env, conversion worked all the time. Unfortunately on original cluster 1.22.16 removing proxy did not helped. It is not the big deal to reinstall it but anyway its not working as expected. Seems that issue is related to the proxy. Are there some recommendations when using a proxy server?

JKrajnik commented 1 year ago

Hello @cindymullins-dw i just reinstalled the entire cluster from 1.22.16 to 1.25.9 and the conversion is still working only for Listener resource

cindymullins-dw commented 1 year ago

Thanks for the update @JKrajnik . Will keep this open as a bug.