apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.49k stars 2.52k forks source link

Discover service error by Consul for apisix version 3.9 #11134

Open jzhao20230918 opened 7 months ago

jzhao20230918 commented 7 months ago

Description

Hello,

We are using Consul as the service discovery and everything was working fine for apisix v3.8. But after upgrade to v3.9, we got following errors:

2024/04/09 06:07:16 [error] 49#49: *8920 [lua] init.lua:91: nodes(): fetch nodes failed by *, return default service, client: 10.0.2.59, server: _, request: "GET / HTTP/1.1", host: "***" 2024/04/09 06:07:16 [error] 49#49: *8920 [lua] init.lua:548: handleupstream(): failed to set upstream: no valid upstream node: nil, client: 10.0.2.59, server: , request: "GET / HTTP/1.1", host: "*****"

Nothing else is changed except the apisix version. Thanks a lot.

Environment

jzhao20230918 commented 7 months ago

return error as follow:

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>openresty</center>
<p><em>Powered by <a href="https://apisix.apache.org/">APISIX</a>.</em></p></body>
</html>
shreemaan-abhishek commented 7 months ago

please share your configurations

jzhao20230918 commented 7 months ago

please share your configurations

I'm using https://github.com/apache/apisix/blob/master/conf/config-default.yaml and changed the etcd and consul configuration.

config.yaml.txt

jzhao20230918 commented 7 months ago

btw, I run Apisix with docker image apache/apisix:3.9.0-debian

shreemaan-abhishek commented 6 months ago

The information you provided is insufficient to attempt reproduction of this bug

jzhao20230918 commented 6 months ago

The information you provided is insufficient to attempt reproduction of this bug

here is a simple version of config: apisix: node_listen: 9080 enable_ipv6: false enable_control: true control: ip: "0.0.0.0" port: 9092 deployment: admin: allow_admin:

I got the error after started apisix: 2024/04/15 03:29:29 [error] 49#49: *40 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/discovery/consul/init.lua:525: attempt to concatenate local 'svc_port' (a nil value)

I suspect something is wrong here and causes service discovery by consul failed. stack traceback: coroutine 0: /usr/local/apisix/apisix/discovery/consul/init.lua: in function </usr/local/apisix/apisix/discovery/consul/init.lua:362>, context: ngx.timer 2024/04/15 03:42:22 [error] 49#49: 7191 [lua] init.lua:91: nodes(): fetch nodes failed by ws-http-echo, return default service, client: 10.0.2.59, server: _, request: "GET / HTTP/1.1", host: "echo.external.apisix" 2024/04/15 03:42:22 [error] 49#49: 7191 [lua] init.lua:548: handleupstream(): failed to set upstream: no valid upstream node: nil, client: 10.0.2.59, server: , request: "GET / HTTP/1.1", host: "echo.external.apisix"

jzhao20230918 commented 6 months ago

root@ip-10-0-2-59:/home/ubuntu# curl -fsL http://127.0.0.1:9092/v1/discovery/consul/dump | jq { "services": {}, "config": { "token": "", "keepalive": true, "weight": 1, "fetch_interval": 3, "timeout": { "connect": 2000, "read": 2000, "wait": 60 }, "servers": [ "http://consul.internal:8500" ], "sort_type": "origin" } }

jzhao20230918 commented 6 months ago

while another node with v3.8: root@ip-10-0-2-59:/home/ubuntu# curl -fsL http://10.0.2.60:9092/v1/discovery/consul/dump | jq { "config": { "keepalive": true, "weight": 1, "timeout": { "connect": 2000, "read": 2000, "wait": 60 }, "fetch_interval": 3, "token": "", "servers": [ "http://consul.internal:8500" ] }, "services": { "alertmanager": [ { "port": 20928, "host": "10.0.2.69", "weight": 1 } ], ...

jzhao20230918 commented 6 months ago

consul version 1.18

jzhao20230918 commented 6 months ago

The information you provided is insufficient to attempt reproduction of this bug

hello, any other info needed?

shreemaan-abhishek commented 6 months ago

@jzhao20230918 I haven't gotten the time to check this bug but the yaml configuration you shared isn't indented at all. Please fix it.

jzhao20230918 commented 6 months ago
apisix:
  node_listen: 9080
  enable_ipv6: false
  enable_control: true
  control:
    ip: "0.0.0.0"
    port: 9092
discovery:
  consul:
    servers:
      - "http://10.0.2.69:8500"
    sort_type: host_sort
    dump:
      path: "consul.dump"
      load_on_init: false
deployment:
  admin:
    allow_admin:  
      - 0.0.0.0/0
    admin_key:
      - name: "admin"
        key: edd1c9f034335f136f87ad84b625c8f1
        role: admin
      - name: "viewer"
        key: 4054f7cf07e344346cd3f287985e76a2
        role: viewer
  etcd:
    host:
      - "http://etcd.internal:2379"
    prefix: "/apisix"
    timeout: 30
plugin_attr:
  prometheus:
    export_addr:
      ip: "0.0.0.0"
      port: 9091
jzhao20230918 commented 6 months ago

might be related to https://github.com/apache/apisix/pull/10941

jzhao20230918 commented 5 months ago

@jzhao20230918 I haven't gotten the time to check this bug but the yaml configuration you shared isn't indented at all. Please fix it.

I found the roor cause.

Line 525 is introduced in 3.9.0. https://github.com/apache/apisix/blob/release/3.9.1/apisix/discovery/consul/init.lua#L525 :

                local svc_address, svc_port = node.Service.Address, node.Service.Port
                -- if nodes is nil, new nodes table and set to up_services
                if not nodes then
                    nodes = core.table.new(1, 0)
                    up_services[service_name] = nodes
                end
                -- not store duplicate service IDs.
                local service_id = svc_address .. ":" .. svc_port

I had a service registed with consul without port provided and that causes svc_port to be nil. And then the whole service discovery by consul function was down. After I registed the service correctly everthing works as before.

Karmenzind commented 4 months ago

I had a service registed with consul without port provided and that causes svc_port to be nil. And then the whole service discovery by consul function was down. After I registed the service correctly everthing works as before.

Same here. @shreemaan-abhishek Any plan to fix it?