GUI / nginx-upstream-dynamic-servers

An nginx module to resolve domain names inside upstreams and keep them up to date.
MIT License
311 stars 74 forks source link

Too many reloads #13

Open gfrankliu opened 8 years ago

gfrankliu commented 8 years ago

ps -ef seems to show nginx workers get reloaded every time when DNS TTL expires. People would normally set TTL low but this would cause lots of unnecessary reloads even if there is no change. Is it possible to add a check so that if the results from the new lookup matches existing, just update the timer and ignore the new lookup results?

whatvn commented 8 years ago

sounds like segment fault

gfrankliu commented 8 years ago

Checked the error_log and that is indeed the case. Filled with those errors:

2016/03/17 21:18:11 [alert] 4378#0: worker process 4403 exited on signal 11 2016/03/17 21:18:17 [alert] 4378#0: worker process 4404 exited on signal 11 2016/03/17 21:18:17 [alert] 4378#0: worker process 4408 exited on signal 11 2016/03/17 21:18:18 [alert] 4378#0: worker process 4409 exited on signal 11 2016/03/17 21:18:27 [alert] 4378#0: worker process 4410 exited on signal 11 2016/03/17 21:18:28 [alert] 4378#0: worker process 4413 exited on signal 11

It seems related to the other module I downloaded from github. Here is how to reproduce: 1) Download nginx-1.9.12.tar.gz (I am using Centos 6) 2) Download the health check module: https://github.com/yaoweibin/nginx_upstream_check_module and apply the patch check_1.9.2+.patch 3) Download the upstream dynamic servers module: https://github.com/GUI/nginx-upstream-dynamic-servers 4) configure and compile nginx ./configure --add-module=nginx-upstream-dynamic-servers --add-module=nginx_upstream_check_module 5) Install and test.

For testing, I added below to the nginx.conf http block:

resolver 8.8.8.8;
upstream pool1 {
  server www.yahoo.com resolve weight=10;
  keepalive 1024;
  check interval=3000 rise=1 fall=2 timeout=1000 type=http default_down=false;
  check_keepalive_requests 360;
  check_http_send "HEAD / HTTP/1.1\r\nHost: www.yahoo.com\r\n\r\n";
  check_http_expect_alive http_2xx http_3xx;
}
server {
  listen 9999 default_server;
  server_name test ;
  location / {
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header Host $http_host;
    proxy_pass http://pool1;
  }
}

Started nginx and I can see the error_log showing crashing every second or two.

wandenberg commented 8 years ago

Hi @gfrankliu can you check if this branch solves the problem? The nginx-upstream-dynamic-servers module was loosing reference to other modules when trying to reinitiate the upstream configuration.

gfrankliu commented 8 years ago

Thanks @wandenberg It is getting better now. I did:

./configure --add-module=nginx-upstream-dynamic-servers --add-module=nginx_upstream_check_module --with-ipv6

and the config:

    resolver 8.8.8.8 ipv6=off;
    upstream pool1 {
      server www.yahoo.com resolve weight=10;
      keepalive 1024;
      check interval=3000 rise=1 fall=2 timeout=1000 type=http default_down=false;
      check_keepalive_requests 360;
      check_http_send "HEAD / HTTP/1.1\r\nHost: www.yahoo.com\r\n\r\n";
      check_http_expect_alive http_2xx http_3xx;
    }

Now it is crashing with different errors if I access the site:

2016/03/21 03:14:14 [error] 2158#0: connect() to [2001:4998:58:c02::a9]:80 failed (101: Network is unreachable)
2016/03/21 03:15:22 [alert] 2157#0: worker process 2158 exited on signal 11
2016/03/21 03:15:22 [error] 2172#0: connect() to [2001:4998:58:c02::a9]:80 failed (101: Network is unreachable)
2016/03/21 03:15:22 [error] 2172#0: disable check peer: [2001:4998:58:c02::a9]:80 
2016/03/21 03:15:57 [alert] 2157#0: worker process 2172 exited on signal 11
2016/03/21 03:16:11 [error] 2177#0: connect() to [2001:4998:58:c02::a9]:80 failed (101: Network is unreachable)
2016/03/21 03:17:03 [alert] 2157#0: worker process 2177 exited on signal 11
2016/03/21 03:17:05 [error] 2181#0: connect() to [2001:4998:58:c02::a9]:80 failed (101: Network is unreachable)

I think it is related to https://github.com/GUI/nginx-upstream-dynamic-servers/issues/12

Then I tried without --with-ipv6

./configure --add-module=nginx-upstream-dynamic-servers --add-module=nginx_upstream_check_module

and the config

    resolver 8.8.8.8;
    upstream pool1 {
      server www.yahoo.com resolve weight=10;
      keepalive 1024;
      check interval=3000 rise=1 fall=2 timeout=1000 type=http default_down=false;
      check_keepalive_requests 360;
      check_http_send "HEAD / HTTP/1.1\r\nHost: www.yahoo.com\r\n\r\n";
      check_http_expect_alive http_2xx http_3xx;
    }

Every request to the site still crashes:

2016/03/21 03:32:30 [alert] 4533#0: worker process 4534 exited on signal 11
2016/03/21 03:32:40 [alert] 4533#0: worker process 4543 exited on signal 11
wandenberg commented 8 years ago

Hi @gfrankliu I will focus on ipv6 issue later. Considering only the ipv4 I think there isn't anything I can do to solve the integration problem with ngx_http_upstream_check_module module on nginx-upstream-dynamic-servers side. As far as I could follow the problem, seems that it only initialize the peer shm pointer on this line and the uniq way to reach this line in through the ngx_http_upstream_check_init_process function. So, any change on peers list after the worker is running will raise into a problem. You can suggest to the owner of the module to make the initialization of peer shm pointer on ngx_http_upstream_check_add_peer function. I didn't tested it but I guess this will solve the problem. Another suggestion is to reset the peers list every time the peer.init_upstream function is called.

gfrankliu commented 8 years ago

I have opened issue https://github.com/yaoweibin/nginx_upstream_check_module/issues/89 but got no reply so far. See my comment https://github.com/GUI/nginx-upstream-dynamic-servers/issues/12 regarding IPv6. I think the IPv6 addresses was added by nginx during the very first DNS queries. Though all later DNS queries didn't do AAAA query for IPv6 address, it had already been added, and giving "damage".

gfrankliu commented 8 years ago

It seems there are some new update to the code, so I cloned https://github.com/GUI/nginx-upstream-dynamic-servers.git and tested again:

1) ./configure --add-module=nginx-upstream-dynamic-servers --add-module=nginx_upstream_check_module --with-ipv6 This still fails due to the first DNS query getting AAAA record though "resolver" has "ipv6=off" as discussed in https://github.com/GUI/nginx-upstream-dynamic-servers/issues/12 Even though second DNS query only gets A record, the AAAA from the first dns query stuck in the upstream list, and every time when I curl against nginx, the worker will crash. nginx seems to try to hit that IPv6 addresses from the very first dns query.

2) If I compiled without ipv6: ./configure --add-module=nginx-upstream-dynamic-servers --add-module=nginx_upstream_check_module It seems no longer crash when I curl against nginx. I can see the responses from yahoo.

    resolver 8.8.8.8;
    upstream pool1 {
      server www.yahoo.com resolve weight=10;
      keepalive 1024;
      check interval=3000 rise=1 fall=2 timeout=1000 type=http default_down=false;
      check_keepalive_requests 360;
      check_http_send "HEAD / HTTP/1.1\r\nHost: www.yahoo.com\r\n\r\n";
      check_http_expect_alive http_2xx http_3xx;
    }

I then added a second server in the upstream block: server www.google.com resolve weight=5; Now nginx worker crashes again every time when I curl a request to nginx.

wandenberg commented 8 years ago

@gfrankliu as I said before the crash you are having isn't from the nginx-upstream-dynamic-servers. The problem is on nginx_upstream_check_module that was not designed to have the server/IP list updated. You will have the same problem compiling the nginx without the ipv6 support. As I wrote on my last answer on #12, the IPs list are not "updated" it is "replaced" after each change the module detects on DNS answer, so for example, if the list returns one ipv4 and one ipv6 on the first query, and only a ipv4 on the second, the module will change the servers list to have only the ipv4 on it.