emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.35k stars 681 forks source link

Problem with local Consul agent resolver and non-standard HTTP port #1508

Closed ewbankkit closed 5 years ago

ewbankkit commented 5 years ago

Describe the bug From Slack discussion:

We have installed the Consul agent as a Daemonset via the Helm chart and this exposes the HTTP service on port 8500 on the node. I configure the Ambassador Consul resolver like so

getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind: Module
      name: ambassador
      config:
        service_port: 8080
      ---
      apiVersion: ambassador/v1
      kind: ConsulResolver
      name: consul-our-dc
      address: "${HOST_IP}:8500"
      datacenter: our-dc

and then configure the demo-app service which is registered with Consul (and not running in the Kubernetes cluster)

apiVersion: v1
kind: Service
metadata:
  name: consul-demo-app
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v1
      kind:  Mapping
      name:  consul_demo_app_mapping
      prefix: /demo-app
      resolver: consul-our-dc
      service: demo-app
spec:
  clusterIP: None

Ambassador doesn't start in a healthy state, it seems to hang

+ set +x
+ /ambassador/watt --notify 'sh /ambassador/post_watt.sh' -s service --watch /ambassador/watch_hook.py
AMBASSADOR: waiting
PIDS: 39:ambex 40:diagd 60:watt
2019/05/07 18:24:21 starting watt...
2019/05/07 18:24:21 starting kubebootstrap
2019/05/07 18:24:21 starting consulwatchman
2019/05/07 18:24:21 starting kubewatchman
2019/05/07 18:24:21 starting aggregator
2019/05/07 18:24:21 starting invoker
2019/05/07 18:24:21 starting api
2019/05/07 18:24:21 kubebootstrap: adding kubernetes watch for "service" in namespace "*"
2019/05/07 18:24:21 api: snapshot server listening on: :7000
2019/05/07 18:24:21 starting api[1]
2019/05/07 18:24:21 kubebootstrap: found 20 "service" in namespace "*"
2019/05/07 18:24:21 kubebootstrap: sent "service" to 1 receivers
2019/05/07 18:24:22 aggregator: watch hook stderr: 2019-05-07 18:24:22 watch-hook INFO: YAML: using C parser
2019/05/07 18:24:22 aggregator: watch hook stderr: 2019-05-07 18:24:22 watch-hook INFO: YAML: using C dumper
2019/05/07 18:24:22 aggregator: watch hook stderr: 
2019/05/07 18:24:22 aggregator: found 0 kubernetes watches
2019/05/07 18:24:22 aggregator: found 1 consul watches
2019/05/07 18:24:22 aggregator: waiting for consul watch: ${HOST_IP}:8500|our-dc|demo-app
2019/05/07 18:24:22 consulwatchman: processing 1 consul watches
2019/05/07 18:24:22 kubewatchman: processing 0 kubernetes watch specs
2019/05/07 18:24:22 consulwatchman: add consul watcher consul:10.207.113.83:8500|our-dc|demo-app
2019/05/07 18:24:22 starting consul:x.x.x.x:8500|our-dc|demo-app
2019/05/07 18:24:22 starting consul:x.x.x.x:8500|our-dc|demo-app[1]
2019/05/07 18:24:22 aggregator: watch hook stderr: 2019-05-07 18:24:22 watch-hook INFO: YAML: using C parser
2019/05/07 18:24:22 aggregator: watch hook stderr: 2019-05-07 18:24:22 watch-hook INFO: YAML: using C dumper
2019/05/07 18:24:22 aggregator: watch hook stderr: 
2019/05/07 18:24:22 aggregator: found 0 kubernetes watches
2019/05/07 18:24:22 aggregator: found 1 consul watches
2019/05/07 18:24:22 aggregator: waiting for consul watch: ${HOST_IP}:8500|dre-qa-us-east-1|demo-app
2019/05/07 18:24:22 consulwatchman: processing 1 consul watches
2019/05/07 18:24:22 kubewatchman: processing 0 kubernetes watch specs

If I kubectl exec into the pod, port 8500 resolves:

$ kubectl --kubeconfig $HOME/.kube/polaris-cs-poc --namespace ambassador exec -it ambassador-xxxxxxxxxxxxxxxx -- /bin/sh
/ambassador $ curl -v http://${HOST_IP}:8500/v1/health/service/demo-app?passing=1
* Expire in 0 ms for 6 (transfer 0x55cc889e2620)
*   Trying x.x.x.x...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55cc889e2620)
* Connected to x.x.x.x (x.x.x.x) port 8500 (#0)
> GET /v1/health/service/demo-app?passing=1 HTTP/1.1
> Host: x.x.x.x:8500
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< Vary: Accept-Encoding
< X-Consul-Effective-Consistency: leader
< X-Consul-Index: 41142
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Tue, 07 May 2019 18:22:47 GMT
< Transfer-Encoding: chunked
< 
[{"Node":...]

Setting address: "${HOST_IP}" (no port) gets a lot further but of course no downstreams become healthy for demo-app since the health checks on https://x.x.x.x/v1/health/service/demo-app?passing=1 all fail.

If I substitute the real host IP address address: "x.x.x.x:8500" then all starts fine.

It looks like at some point the environment variable isn't being evaluated.

Versions (please complete the following information):

richarddli commented 5 years ago

Thanks! More details here: https://github.com/datawire/teleproxy/issues/110

richarddli commented 5 years ago

Fix has shipped in 0.61.