UlricE / pen

Pen
Other
250 stars 41 forks source link

UDP load balancing #19

Open silviud opened 9 years ago

silviud commented 9 years ago

The UDP load balancer algorithm doesn't count for dead servers. Example:

./pen -fU 8080 127.0.0.1:8001 127.0.0.1:8002

If servers are up on port 8001 and 8002 traffic is forwarded, however if the server is not up on port 8001 pen will not detect it nor will stop forwarding traffic to it ...

Any plans to add this kind of detection ?

thanks!

-silviu

UlricE commented 9 years ago

Pen is blind to what happens to udp traffic after it is forwarded. If there is a way to detect that a back-end is nonresponsive (e.g. a dns server that doesn't reply), you can use a script to monitor them. Here's an old example for http which can trivially be updated for other protocols:

http://siag.nu/hypermail/pen/0038.html

silviud commented 9 years ago

Hi,

I tried with the blacklist but got partial success. This is what happened

  1. Started pen ./pen -fU -a -dd 8080 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000
  2. Connect a client that receive a response from server 1 (port 10001)
  3. Blacklist the server 1 ./penctl localhost:9000 server 1 blacklist 30
  4. Connect the client again but it gets nowhere

2015-10-14 10:55:18: add_client: received 4 bytes from client 2015-10-14 10:55:18: Client 127.0.0.1 has index 0 2015-10-14 10:55:18: store_client returns 0 2015-10-14 10:55:18: incrementing connections_used to 1 for connection 0 2015-10-14 10:55:18: store_conn: conn = 0, downfd = 4, connections_used = 1 2015-10-14 10:55:18: store_conn returns 0 2015-10-14 10:55:18: match_acl_ipv4(0, 16777343) 2015-10-14 10:55:18: Will try previous server 1 for client 0 2015-10-14 10:55:18: Trying server 1 for connection 0 at time 1444834518 2015-10-14 10:55:18: Server 1 is blacklisted 2015-10-14 10:55:18: failover_server(0) 2015-10-14 10:55:18: Won't failover from abuse server 2015-10-14 10:55:18: decrementing connections_used to 0 for connection 0 2015-10-14 10:55:18: close_conn: Closing connection 0 to server -3; connections_used = 0 Read 0 from client, wrote 0 to server Read 0 from server, wrote 0 to client 2015-10-14 10:55:18: No failover server found, giving up

The client will not work until the blacklist window will expire and reconnect on port 10001.

My expectation was to fail over to a different port since the server 1 was blacklisted.

UlricE commented 9 years ago

That would be a reasonable expectation, I think. Let me try to reproduce the problem and see if it is a bug.

UlricE commented 9 years ago

The latest version in Git fixes this failover problem. Here's what I get:

First prepare three pens proxying dns requests to google (to get something to test against) and verify that they work:

ulric@debtest:~/Git/pen$ ./pen -U 127.0.0.1:10000 8.8.8.8:53 ulric@debtest:~/Git/pen$ ./pen -U 127.0.0.1:10001 8.8.8.8:53 ulric@debtest:~/Git/pen$ ./pen -U 127.0.0.1:10002 8.8.8.8:53 ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 10000 +short siag.nu 194.9.95.65 ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 10001 +short siag.nu 194.9.95.65 ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 10002 +short siag.nu 194.9.95.65

Then start Pen, same command line as you used above:

ulric@debtest:~/Git/pen$ ./pen -fU -a -dd 8080 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000 > log 2>&1

And from another terminal, test failover:

ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 8080 +short siag.nu 194.9.95.65 ulric@debtest:~/Git/pen$ ./penctl localhost:9000 server 1 blacklist 30 ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 8080 +short siag.nu 194.9.95.65

So that looks good. The log says:

2015-11-02 10:05:30: add_client: received 36 bytes from client 2015-11-02 10:05:30: Resetting client stats for slot 0 2015-11-02 10:05:30: Client 127.0.0.1 has index 0 2015-11-02 10:05:30: store_client returns 0 2015-11-02 10:05:30: incrementing connections_used to 1 for connection 0 2015-11-02 10:05:30: expanding fd2conn to 10006 bytes 2015-11-02 10:05:30: store_conn: conn = 0, downfd = 6, connections_used = 1 2015-11-02 10:05:30: store_conn returns 0 2015-11-02 10:05:30: match_acl_ipv4(0, 16777343) 2015-11-02 10:05:30: Will try previous server -3 for client 0 2015-11-02 10:05:30: Trying server 1 for connection 0 at time 1446455130 2015-11-02 10:05:30: match_acl_ipv4(0, 16777343) 2015-11-02 10:05:30: socket returns 8, socket_errno=0 2015-11-02 10:05:30: Connecting to 127.0.0.1 2015-11-02 10:05:30: Family: AF_INET 2015-11-02 10:05:30: Port: 10001 2015-11-02 10:05:30: Address: 127.0.0.1 2015-11-02 10:05:30: connect (upfd = 8) returns 0, errno = 0, socket_errno = 0 2015-11-02 10:05:30: epoll_event_add(fd=8, events=65536) 2015-11-02 10:05:30: epoll_event_ctl(fd=8, events=65536, op=1) 2015-11-02 10:05:30: Successful connect to server 1 conns[0].client = 0 conns[0].server = 1 2015-11-02 10:05:30: Setting server 1 for client 0 2015-11-02 10:05:30: add_client: wrote 36 bytes to socket 8 2015-11-02 10:05:30: epoll_event_fd(revents=0x7fffce832c94) 2015-11-02 10:05:30: epoll_event_wait() 2015-11-02 10:05:30: epoll_wait returns 1 2015-11-02 10:05:30: After event_wait() 2015-11-02 10:05:30: epoll_event_fd(revents=0x7fffce832c94) 2015-11-02 10:05:30: event_fd returns fd=8, events=65536 2015-11-02 10:05:30: want to read from upstream socket 8 of connection 0 2015-11-02 10:05:30: copy_down: recv(8, 0x7fffce82ac30, 32768, 0) returns 52 2015-11-02 10:05:30: copy_down sending 52 bytes to socket 6 2015-11-02 10:05:30: epoll_event_delete(fd=8) 2015-11-02 10:05:30: decrementing connections_used to 0 for connection 0 2015-11-02 10:05:30: close_conn: Closing connection 0 to server 1; connections_used = 0 Read 0 from client, wrote 0 to server Read 0 from server, wrote 0 to client [...] 2015-11-02 10:05:37: do_cmd(server 1 blacklist 30 , 0x404470, 0x7fffce831c5c) [...] 2015-11-02 10:05:42: add_client: received 36 bytes from client 2015-11-02 10:05:42: Client 127.0.0.1 has index 0 2015-11-02 10:05:42: store_client returns 0 2015-11-02 10:05:42: incrementing connections_used to 1 for connection 0 2015-11-02 10:05:42: store_conn: conn = 0, downfd = 6, connections_used = 1 2015-11-02 10:05:42: store_conn returns 0 2015-11-02 10:05:42: match_acl_ipv4(0, 16777343) 2015-11-02 10:05:42: Will try previous server 1 for client 0 2015-11-02 10:05:42: Trying server 1 for connection 0 at time 1446455142 2015-11-02 10:05:42: Server 1 is blacklisted 2015-11-02 10:05:42: failover_server(0): server = 1 2015-11-02 10:05:42: Intend to try server 2 2015-11-02 10:05:42: Trying server 2 for connection 0 at time 1446455142 2015-11-02 10:05:42: match_acl_ipv4(0, 16777343) 2015-11-02 10:05:42: socket returns 8, socket_errno=0 2015-11-02 10:05:42: Connecting to 127.0.0.1 2015-11-02 10:05:42: Family: AF_INET 2015-11-02 10:05:42: Port: 10002 2015-11-02 10:05:42: Address: 127.0.0.1 2015-11-02 10:05:42: connect (upfd = 8) returns 0, errno = 0, socket_errno = 0 2015-11-02 10:05:42: epoll_event_add(fd=8, events=65536) 2015-11-02 10:05:42: epoll_event_ctl(fd=8, events=65536, op=1) 2015-11-02 10:05:42: Successful connect to server 2 conns[0].client = 0 conns[0].server = 1 2015-11-02 10:05:42: Setting server 2 for client 0 2015-11-02 10:05:42: add_client: wrote 36 bytes to socket 8 2015-11-02 10:05:42: epoll_event_fd(revents=0x7fffce832c94) 2015-11-02 10:05:42: epoll_event_wait() 2015-11-02 10:05:42: epoll_wait returns 1 2015-11-02 10:05:42: After event_wait() 2015-11-02 10:05:42: epoll_event_fd(revents=0x7fffce832c94) 2015-11-02 10:05:42: event_fd returns fd=8, events=65536 2015-11-02 10:05:42: want to read from upstream socket 8 of connection 0 2015-11-02 10:05:42: copy_down: recv(8, 0x7fffce82ac30, 32768, 0) returns 52 2015-11-02 10:05:42: copy_down sending 52 bytes to socket 6 2015-11-02 10:05:42: epoll_event_delete(fd=8) 2015-11-02 10:05:42: decrementing connections_used to 0 for connection 0 2015-11-02 10:05:42: close_conn: Closing connection 0 to server 2; connections_used = 0 Read 0 from client, wrote 0 to server Read 0 from server, wrote 0 to client

UlricE commented 9 years ago

Closing since the fix is in 0.31.1.

ccs10021 commented 8 years ago

Hi - New to Pen and have just started playing around with dns load balancing. Can't seem to get the load balancer to adjust for failures within load balance pool.

Am trying to work through your examples from above to get a better handle on health checks and blacklisting.

Have done this config based on your examples...

./pen -U 127.0.0.1:10000 10.10.10.1:53 ./pen -U 127.0.0.1:10001 10.10.10.4:53 ./pen -fU -a -dd 53 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000 > log 2>&1 ./penctl localhost:9000 server 1 blacklist 30

Getting this error on blacklisting: [root@xxx-xxx01 pen-0.31.1]# penctl localhost:9000 server 1 blacklist 30 error connecting to server

Server is up though: root@xxx-xxx01 pen-0.31.1]# dig @127.0.0.1 -p 10000 +short siag.nu 194.9.95.65

Any ideas?

Also, I've only been seeing empty log files so far. Maybe I am looking in the wrong place?

Any help is greatly appreciated!

Thank you, CCS

UlricE commented 8 years ago

Looking at your third command line, I see that you're running Pen as root since it's listening on port 53, but then it will be reluctant to create the listening socket. Look near the top of the log file and you should find a line similar to "Won't open control port running as root; use -u to run as different user".

And the error message from penctl simply means the control port isn't listening.

ccs10021 commented 8 years ago

Thank you very much for your help. I am now running pen as non-root using an iptable nat to redirect 53 to 8080 on the listening vip. So now, the penctl channel is working fine.

I'm still having some issue with creating my init.d script such that the pen service starts upon boot of the server. Seems I'm running into permissions issues with the pid and log files. Not sure who should own those files, ie, root or non-root user.

Also, working out a script for doing the health check on the back end. Have been working on a script which will run dns calls to my target dns servers which I am load balancing against. If those dns calls fail, the script calls penctl to blacklist the failed server. Just wanted to confirm with you that scripts would be required for this type of health checking, ie, pen can not health check downstream directly?

Thanks again, CCS

UlricE commented 8 years ago

You can get a bunch of hints for the init script here:

https://github.com/UlricE/pen/wiki/Pen-and-Systemd

It's written for systemd but a lot of the priciples carry over.

You are right that Pen doesn't know anything about the back end health. Remember that unlike TCP, where the three-way handshake confirms that a connection has been made, there is no corresponding mechanism in UDP.