apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.44k stars 2.51k forks source link

help request: Data-plane cannot dynamically modify the upstreams address. You need to execute apisix reload to get the latest address #8947

Open jinjianming opened 1 year ago

jinjianming commented 1 year ago

Description

  1. At present, I have learned about the coupling architecture through the K8S deployment. When I restart the POD, the data plane service cannot update the latest address, resulting in the request for 502 bad gateway

Discovery sync_ Data() res: null res result is null and the update failed

2023/02/27 04:11:20 [info] 50#50: *381 [lua] resolver.lua:88: parse_domain(): dns resolver domain: apisix-control-plane-control-plane to 10.233.46.87, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] v3.lua:852: request_chunk(): http request method: POST path: /watch body: {"create_request":{"key":"L2FwaXNpeC91cHN0cmVhbXM=","range_end":"L2FwaXNpeC91cHN0cmVhbXQ=","start_revision":47}} query: nil, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] config_etcd.lua:392: sync_data(): waitdir key: /apisix/upstreams prev_index: 47, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] config_etcd.lua:393: `sync_data(): res: null`, err: timeout, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] v3.lua:76: choose_endpoint(): choose endpoint: https://apisix-control-plane-control-plane:9280, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] client.lua:123: dns_parse(): dns resolve apisix-control-plane-control-plane, result: {"name":"apisix-control-plane-control-plane.ingress-apisix.svc.cluster.local","type":1,"class":1,"address":"10.233.46.87","ttl":30,"section":1}, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] resolver.lua:84: parse_domain(): parse addr: {"name":"apisix-control-plane-control-plane.ingress-apisix.svc.cluster.local","type":1,"class":1,"section":1,"ttl":30,"address":"10.233.46.87"}, context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] resolver.lua:85: parse_domain(): resolver: ["169.254.25.10"], context: ngx.timer
2023/02/27 04:11:20 [info] 50#50: *383 [lua] resolver.lua:86: parse_domain(): host: apisix-control-plane-control-plane, context: ngx.timer
  1. The requested address is still the old one; can see that the address in the ETCD is 10.233.64.32 and the data-plane is actually forwarded to 10.233.64.29.

    2023/02/27 07:48:33 [info] 68#68: *178638 [lua] balancer.lua:196: pick_server(): ctx: {"upstream_conf":{"labels":{"managed-by":"apisix-ingress-controller"},"pass_host":"pass","type":"roundrobin","nodes":[{"upstream_host":"10.233.64.29","priority":0,"weight":100,"port":80,"host":"10.233.64.29"}],"parent":{"key":"\/apisix\/upstreams\/16690fc3","value":"table: 0x7f4ef61c19f8","createdIndex":30,"clean_handlers":{},"has_domain":false,"modifiedIndex":335},"hash_on":"vars","desc":"Created by apisix-ingress-controller, DO NOT modify it manually","name":"default_js-design-nginx_80","scheme":"http","nodes_ref":"table: 0x7f4ef61c1c28","update_time":1677483100,"id":"16690fc3","create_time":1677470559,"original_nodes":"table: 0x7f4ef61c1c28"},"conf_type":"route","upstream_version":"335#table: 0x7f4ef61c19f8","upstream_key":"16690fc3","var":{"_cache":{"real_request_uri":"\/","request_uri":"\/","uri":"\/","upstream_scheme":"http","request_method":"GET","remote_addr":"192.168.0.48","host":"cjyoumen.com"},"_ctx":{"upstream_conf":"table: 0x7f4ef61c19f8","conf_type":"route","upstream_version":"335#table: 0x7f4ef61c19f8","upstream_key":"16690fc3","var":"table: 0x7f4eec0ae178","matched_upstream":"table: 0x7f4ef61c19f8","route_id":"27861e91","route_name":"default_js-design-web_rule1","conf_version":331,"matched_route":{"update_count":0,"modifiedIndex":331,"value":{"labels":{"managed-by":"apisix-ingress-controller"},"uris":["\/*"],"upstream_id":"16690fc3","hosts":["cjyoumen.com"],"desc":"Created by apisix-ingress-controller, DO NOT modify it manually","name":"default_js-design-web_rule1","priority":0,"update_time":1677483010,"id":"27861e91","create_time":1677470560,"status":1},"createdIndex":32,"key":"\/apisix\/routes\/27861e91","clean_handlers":{},"has_domain":false,"orig_modifiedIndex":331},"conf_id":"27861e91","curr_req_matched":"table: 0x7f4eec4e0d80","plugins":{},"upstream_scheme":"http"},"_request":"cdata<void *>: 0x5596730cfba0"},"matched_upstream":"table: 0x7f4ef61c19f8","route_id":"27861e91","route_name":"default_js-design-web_rule1","conf_version":331,"matched_route":"table: 0x7f4ef61c3118","conf_id":"27861e91","curr_req_matched":{"_path":"\/*","_host":"cjyoumen.com",":ext":"","_method":"GET"},"plugins":"table: 0x7f4eec4e2490","upstream_scheme":"http"}, client: 192.168.0.48, server: _, request: "GET / HTTP/1.1", host: "cjyoumen.com"
    2023/02/27 07:48:33 [info] 68#68: *178638 [lua] balancer.lua:384: run(): proxy request to 10.233.64.29:80 while connecting to upstream, client: 192.168.0.48, server: _, request: "GET / HTTP/1.1", host: "cjyoumen.com"
    2023/02/27 07:48:36 [error] 68#68: *178638 connect() failed (113: No route to host) while connecting to upstream, client: 192.168.0.48, server: _, request: "GET / HTTP/1.1", upstream: "http://10.233.64.29:80/", host: "cjyoumen.com"
    ### ETCD 
    I have no name!@apisix-control-plane-etcd-0:/opt/bitnami/etcd$ etcdctl get /apisix/upstreams/16690fc3 --prefix 
    /apisix/upstreams/16690fc3
    {"hash_on":"vars","desc":"Created by apisix-ingress-controller, DO NOT modify it manually","scheme":"http","labels":{"managed-by":"apisix-ingress-controller"},"create_time":1677470559,"pass_host":"pass","update_time":1677483910,"nodes":[{"priority":0,"host":"10.233.64.32","weight":100,"port":80}],"type":"roundrobin","name":"default_js-design-nginx_80","id":"16690fc3"}
  2. I can see from the dashboard that upstream is up to date, and Data-plane cannot automatically refresh the latest address;

  3. Whether to add 'sni' or other configurations

Environment

tokers commented 1 year ago

We need some information for troubleshooting:

  1. How did you configure the Route, Upstream, and other objects?
  2. Did you use the K8s service discovery module?
  3. What's the DNS resolver of your APISIX data plane?
jinjianming commented 1 year ago

我们需要一些信息来进行故障排除:

  1. 您是如何配置 Route、Upstream 和其他对象的?
  2. 你用过K8s的服务发现模块吗?
  3. APISIX 数据平面的 DNS 解析器是什么?

1.I installed ingress-controller and published routing rules through ApisixRoute. Backends is an Upstream automatically generated by SVC name;

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: js-design-web
spec:
  http:
  - name: rule1
    match:
      hosts:
      - cjyoumen.com
      paths:
      - /*
    backends:
       - serviceName: js-design-nginx
         servicePort: 80

2.Implemented through ingress-controller

3.dns using coredns

      dns_resolver:
        - coredns.kube-system.svc.cluster.local
      dns_resolver_valid: 30
      resolver_timeout: 5
jinjianming commented 1 year ago

can see the correct address of the parsed control-plane in the log of the data plane:10.233.46.87

2023/02/27 09:53:22 [info] 50#50: *377 [lua] resolver.lua:88: parse_domain(): dns resolver domain: apisix-control-plane-control-plane to `10.233.46.87`, context: ngx.timer

root@node01:/home/apisix/apisix-1.1.0# kubectl get svc -n ingress-apisix|grep 10.233.46.87
apisix-control-plane-control-plane        NodePort    10.233.46.87    <none>        9280:34054/TCP          5h53m
jinjianming commented 1 year ago

@tokers When the upstream does not change, the first startup can work normally. Once the Upstream changes, the data plane will not be able to load the latest address. What is the difference between the first load and the POST path:/watch, resulting in inaccessible access;

jinjianming commented 1 year ago

@tokers Hello, do you have any other suggestions for troubleshooting

jinjianming commented 1 year ago

@tokers 请教您一下在解耦架构下必须使用kubernetes 服务发现或者其他服务发现么?

我尝试在DP模块上配置kubernetes discovery确实可以实现上游地址的更新,但是通过admin 创建的路由和上游地址,无法更新到DP生效;

discovery:
kubernetes: {}

我使用docker-compose也有相同的问题,我在这个连接中 https://github.com/apache/apisix/issues/9049 详细描述了如何复现此问题,并在附件中添加了我的部署文件,是否能够支持一下问题所在,万分感谢。

winston0410 commented 1 year ago

@jinjianming are you able to fix this issue at the end? I encountered the exact same issue, is it possible to force apisix to reload?

github-actions[bot] commented 2 months ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

olexiyb commented 1 month ago

I see the same issue, the upstream failed to refresh after pod restart

olexiyb commented 1 month ago

I have added resolveGranularity: service to resolve

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  name: block-explorer
spec:
  http:
    - name: http
      priority: 0
      match:
        hosts:
          - host
        paths:
          - /*
      backends:
        - serviceName: <service>
          servicePort: <port>
          resolveGranularity: service

Yes we loose apisix load balancing, but this is ok for us