alibaba / tengine

A distribution of Nginx with some advanced features
https://tengine.taobao.org
BSD 2-Clause "Simplified" License
12.81k stars 2.52k forks source link

upstream_dynamic and upstream_check not mutually compatible #659

Open XavM opened 8 years ago

XavM commented 8 years ago

Hello,

The 2 following modules seam to be incompatible with each other :

When a node is flagged as "down" by the check module, tengine still forward requests to this node, resulting in HTTP-502

When a new IP is added to the domain (thru an external DNS server), the dynamic_resolve module add it to the upstream liste and forwards requests to it, but the check module does not display it thru the check_status http endpoint

To reproduce :

Conf :

$> cat nginx.conf 
http {

        upstream nodes  {

                dynamic_resolve fallback=stale fail_timeout=30s;

                server test.dev:8080;

                check interval=3000 default_down=false rise=2 fall=2 timeout=2000 type=http;
                check_keepalive_requests 60;
                check_http_send "GET / HTTP/1.0\r\nHost: test.dev\r\n\r\n";
                check_http_expect_alive http_2xx;
        }

        server {
                listen 80;

                location /nodes/ {
                        proxy_pass  http://nodes/;
                }

                location /status {
                  check_status;
                }

        }
}

The "test.dev" domain resolved to only 2 IPs when nginx is first started

A third IP (192.168.0.30) is then added to the "test.dev" domain

$> dig @127.0.0.1 +short test.dev
192.168.0.21
192.168.0.22
192.168.0.30

The check_status URL only returns the list of known IPs from start time

$> curl 127.0.0.1/status?format=json
Tue Oct 27 09:51:20 EDT 2015
{"servers": {
  "total": 2,
  "generation": 1,
  "server": [
    {"index": 0, "upstream": "nodes", "name": "192.168.0.21:8080", "status": "down", "rise": 0, "fall": 278, "type": "http", "port": 0},
    {"index": 1, "upstream": "nodes", "name": "192.168.0.22:8080", "status": "up", "rise": 568, "fall": 0, "type": "http", "port": 0}
  ]
}}

When one IP is detected as "status": "down", tengine keeps sending requests to that IP

$> curl -I 127.0.0.1/nodes/
HTTP/1.1 502 Bad Gateway
Server: Tengine/2.1.1
Date: Tue, 27 Oct 2015 14:02:29 GMT
Content-Type: text/html
Content-Length: 600
Connection: keep-alive
$> tail /var/log/nginx/error.log 
2015/10/27 10:02:25 [error] 4794#0: check time out with peer: 192.168.0.21:8080 
2015/10/27 10:02:29 [error] 4794#0: *41 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: , request: "HEAD /nodes/ HTTP/1.1", upstream: "http://192.168.0.21:8080/", host: "127.0.0.1"
2015/10/27 10:02:29 [error] 4794#0: *41 no live upstreams while connecting to upstream, client: 127.0.0.1, server: , request: "HEAD /nodes/ HTTP/1.1", upstream: "http://nodes/", host: "127.0.0.1"
cfsego commented 8 years ago

yes, upstream_check is too old, and not compatible to any module which changes upstream dynamically. we has known it, before 2.1.1 being issued, but we can not spare too much time on that. so if someone helps us redesign this module, we will appreciate it.

gfrankliu commented 8 years ago

With tengine 2.1.2, are those two modules still not compatible? If not, any other workaround?

- ngx_http_upstream_check_module (check module)
- ngx_http_upstream_dynamic_module (dynamic_resolve module)
tony612 commented 7 years ago

@gfrankliu It seems not. We're using Tengine version: Tengine/2.2.0 (nginx/1.8.1), but still got this kind of errors. :(

ArighnaIITG commented 3 years ago

Has this issue been solved yet?