acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.96k stars 737 forks source link

nc -v Keepavlied VIP get error: No route to host #2472

Open PunkFleet opened 3 days ago

PunkFleet commented 3 days ago

I fllowoing this setting a LB for k8s cluster : https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#options-for-software-load-balancing but the cluster nodes can not connection LB VIP. i post my all config in here , i think my keepalived + haproxy config is correct, but erorr still exist. i doubt it's ISP limited machine VIP? https://stackoverflow.com/questions/79037704/how-to-build-keepalived-haproxy-load-balancer-on-digitalocean

# comm VIP on cluster node
[root@centos-s-2vcpu-2gb-nyc3-02 ~]# nc -vz 10.99.99.99 5000
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: No route to host.

haproxy.cfg

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        option http-server-close
        option                  redispatch
        retries                 1
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

frontend apiserver
        bind *:5000
        mode tcp
        option tcplog
        default_backend apiserverbackend
backend apiserverbackend
        mode http
        option httpchk GET /healthz
        http-check expect status 200
        balance     roundrobin
        default-server check inter 3s fall 3 rise 2
        server master1 10.99.99.4:6443 check ssl verify none
        server master2 10.99.99.9:6443 check ssl verify none
        server master3 10.99.99.8:6443 check ssl verify none

keepalived.conf

! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}
vrrp_instance VI_1 {
  state MASTER
  interface eth1
  virtual_router_id 51
  priority 101
  advert_int 1
  authentication {
      auth_type PASS
      auth_pass 1111
  }
  unicast_src_ip 10.99.99.2
  unicast_peer {
    10.99.99.3
  }
  virtual_ipaddress {
    10.99.99.99
  }
  track_script {
    check_apiserver
  }
}

ip a is wokring

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 3a:f0:eb:13:ff:b9 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 167.71.95.103/20 brd 167.71.95.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.17.0.5/16 brd 10.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::38f0:ebff:fe13:ffb9/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 8e:99:21:72:51:83 brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.99.99.2/24 brd 10.99.99.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 10.99.99.99/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::8c99:21ff:fe72:5183/64 scope link 
       valid_lft forever preferred_lft forever
root@lb1:~# 
nser77 commented 2 days ago

Hi @PunkFleet , can you please share the route tables of centos-s-2vcpu-2gb-nyc3-02 host?

PunkFleet commented 2 days ago

Hi @PunkFleet , can you please share the route tables of centos-s-2vcpu-2gb-nyc3-02 host?

Hi @nser77 , the host route tables in here

[root@centos-s-2vcpu-2gb-nyc3-02 ~]# ip route show
default via 138.197.96.1 dev eth0 proto static metric 100 
10.17.0.0/16 dev eth0 proto kernel scope link src 10.17.0.7 metric 100 
10.99.99.0/24 dev eth1 proto kernel scope link src 10.99.99.4 metric 101 
138.197.96.0/20 dev eth0 proto kernel scope link src 138.197.99.141 metric 100 
[root@centos-s-2vcpu-2gb-nyc3-02 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 6a:23:c4:67:3e:ed brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 138.197.99.141/20 brd 138.197.111.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.17.0.7/16 brd 10.17.255.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6823:c4ff:fe67:3eed/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 22:da:6e:1a:8f:81 brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.99.99.4/24 brd 10.99.99.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::20da:6eff:fe1a:8f81/64 scope link 
       valid_lft forever preferred_lft forever

and there are my iptables information

[root@centos-s-2vcpu-2gb-nyc3-02 ~]# curl http://169.254.169.254/metadata/v1/interfaces/public/0/anchor_ipv4/address
10.17.0.7
[root@centos-s-2vcpu-2gb-nyc3-02 ~]# iptables -L -v -n
Chain INPUT (policy ACCEPT 19M packets, 3237M bytes)
 pkts bytes target     prot opt in     out     source               destination         
  19M 3237M KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 23M packets, 3165M bytes)
 pkts bytes target     prot opt in     out     source               destination         
  23M 3165M KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain KUBE-FIREWALL (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       0    --  *      *      !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-KUBELET-CANARY (0 references)
 pkts bytes target     prot opt in     out     source               destination        
pqarmitage commented 2 days ago

There doesn't appear to be anything wrong with your keepalived configuration; your haproxy configuration is not a matter for this list.

You have shown above that keepalived is working correctly, since the IP address 10.99.99.99 is configured on eth1, so you are really going to have to diagnose the problem at your end.

Since 10.99.99.99 is on the same subnet as 10.99.99.4, the "No route to host" error message means that 10.99.99.4 cannot obtain a MAC address for 10.99.99.99 (if 10.99.99.4 could communicate with 10.99.99.99 but there were simply no service listening in port 5000, then you would get a "Connection refused" error message from nc.

I suggest you try using wireshark or tcpdump to trace what is happening on on both 10.99.99.4 and 10.99.99.2 to see if you can identify where things are going wrong.

PunkFleet commented 2 days ago

the "No route to host" error message means that 10.99.99.4 cannot obtain a MAC address for 10.99.99.99

10.99.99.99 is VIP not real machine. then i how to config there MAC address? Im newbie in here, Is there any way I can fix this error?

pqarmitage commented 1 day ago

From above, it shows:

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 8e:99:21:72:51:83 brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.99.99.2/24 brd 10.99.99.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 10.99.99.99/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::8c99:21ff:fe72:5183/64 scope link 
       valid_lft forever preferred_lft forever

The IP address 10.99.99.99 is configured on eth1, and therefore the MAC address associated with 10.99.99.99 is 8e:99:21:72:51:83.

Can you please try your nc command on 10.99.99.4 while keepalived is running and the address 10.99.99.99 is configured, and immediately afterwards execute arp -a on 10.99.99.4 and post the results in this log.

PunkFleet commented 23 hours ago

Can you please try your nc command on 10.99.99.4 while keepalived is running and the address 10.99.99.99 is configured, and immediately afterwards execute arp -a on 10.99.99.4 and post the results in this log.

keepalived VIP is 10.99.99.99 and keepalived is running on LB1 machine

root@lb1:~# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
     Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-10-01 10:43:13 UTC; 16h ago
       Docs: man:keepalived(8)
             man:keepalived.conf(5)
             man:genhash(1)
             https://keepalived.org
   Main PID: 47012 (keepalived)
      Tasks: 3 (limit: 1112)
     Memory: 2.2M (peak: 2.5M)
        CPU: 14.558s
     CGroup: /system.slice/keepalived.service
             ├─47012 /usr/sbin/keepalived --dont-fork
             ├─47013 /usr/sbin/keepalived --dont-fork
             └─47014 /usr/sbin/keepalived --dont-fork

Oct 01 10:43:13 lb1 Keepalived_vrrp[47014]: (VI_1) received lower priority (100) advert from 10.99.99.3 - discarding
Oct 01 10:43:14 lb1 Keepalived_vrrp[47014]: (VI_1) received lower priority (100) advert from 10.99.99.3 - discarding
Oct 01 10:43:15 lb1 Keepalived_vrrp[47014]: (VI_1) received lower priority (100) advert from 10.99.99.3 - discarding
Oct 01 10:43:16 lb1 Keepalived_vrrp[47014]: (VI_1) received lower priority (100) advert from 10.99.99.3 - discarding
Oct 01 10:43:16 lb1 Keepalived_vrrp[47014]: (VI_1) Entering MASTER STATE
Oct 01 10:43:18 lb1 Keepalived_healthcheckers[47013]: TCP_CHECK on service [10.99.99.3]:tcp:8443 failed.
Oct 01 10:43:18 lb1 Keepalived_healthcheckers[47013]: Removing service [10.99.99.3]:tcp:8443 from VS [10.99.99.99]:tcp:8443
Oct 01 10:43:20 lb1 Keepalived_healthcheckers[47013]: TCP_CHECK on service [10.99.99.2]:tcp:8443 failed.
Oct 01 10:43:20 lb1 Keepalived_healthcheckers[47013]: Removing service [10.99.99.2]:tcp:8443 from VS [10.99.99.99]:tcp:8443
Oct 01 10:43:20 lb1 Keepalived_healthcheckers[47013]: Lost quorum 1-0=1 > 0 for VS [10.99.99.99]:tcp:8443

vrrp_instance VI_1 {
  state MASTER
  interface eth1
  virtual_router_id 51
  priority 101
  advert_int 1
  garp_master_delay 1
  authentication {
      auth_type PASS
      auth_pass 1111
  }
  unicast_src_ip 10.99.99.2
  unicast_peer {
    10.99.99.3
  }
  virtual_ipaddress {
    10.99.99.99
  }
  track_script {
    check_apiserver
  }
}

and on Master1 machine:

[root@centos-s-2vcpu-2gb-nyc3-02 ~]# nc -v 10.99.99.99 8443
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: No route to host.
[root@centos-s-2vcpu-2gb-nyc3-02 ~]# nc -v 10.99.99.99
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: No route to host.
[root@centos-s-2vcpu-2gb-nyc3-02 ~]# arp -a
? (10.99.99.7) at ce:52:d2:e7:8a:51 [ether] on eth1
? (10.17.0.6) at <incomplete> on eth0
? (10.99.99.3) at 4e:79:e6:8b:41:6c [ether] on eth1
? (10.99.99.6) at b6:05:98:06:51:15 [ether] on eth1
? (10.17.0.99) at <incomplete> on eth0
? (10.99.99.99) at <incomplete> on eth1
_gateway (138.197.96.1) at fe:00:00:00:01:01 [ether] on eth0
? (10.17.0.5) at <incomplete> on eth0
? (10.17.0.2) at <incomplete> on eth0
? (10.99.99.2) at 8e:99:21:72:51:83 [ether] on eth1
? (10.99.99.5) at 9e:f9:4e:cd:8e:eb [ether] on eth1
pqarmitage commented 18 hours ago

The arp -a output show that 10.99.99.4 can get an ARP response for 10.99.99.2 but not for 10.99.99.99. I am not sure of the relevance of the information before that arp -a output.

You need to investigate what is happening on your network to identify the cause of your problem; there is nothing we can do since we do not have access to your network.

PunkFleet commented 17 hours ago

The arp -a output show that 10.99.99.4 can get an ARP response for 10.99.99.2 but not for 10.99.99.99. I am not sure of the relevance of the information before that arp -a output.

You need to investigate what is happening on your network to identify the cause of your problem; there is nothing we can do since we do not have access to your network.

10.99.99.2 is LB1 machine private ip, 10.99.99.99 is KeepAlived VIP address on LB1 and LB2 Here's the problem, I can link the real private IP of the load balancer machine, but I can't link the VIP anyway

pqarmitage commented 17 hours ago

As I said above:

You need to investigate what is happening on your network to identify the cause of your problem; there is nothing we can do since we do not have access to your network.