Host on eth0 not accessible when using use_vmac

zbugrkx commented 1 day ago

I have a pretty simple configuration of two rpi running keepalived.

Config of one of them (other just has priority 100 as difference) is like that:

vrrp_sync_group KEEPV4V6 { group { KEEPV4 KEEPV6 }
} vrrp_instance KEEPV4 { state BACKUP interface eth0 use_vmac vmac_xmit_base virtual_router_id 44 priority 200 advert_int 1 unicast_src_ip 10.10.10.11 unicast_peer { 10.10.10.10 }

authentication { auth_type PASS auth_pass secret }

virtual_ipaddress { 10.10.10.20/24 } } vrrp_instance KEEPV6 { state BACKUP interface eth0 use_vmac vmac_xmit_base virtual_router_id 44 priority 200 advert_int 1 native_ipv6 true

virtual_ipaddress { fe80::44/128 xxxx:xxx:xxx:xxxx::44/64 } }

I have followed the required sysctl config below:

net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 1 net.ipv4.conf.all.arp_filter = 0 net.ipv4.conf.eth0.arp_filter = 1

net.ipv4.conf.vrrp.44.arp_filter = 0 net.ipv4.conf.vrrp.44.accept_local = 1 net.ipv4.conf.vrrp.44.rp_filter = 0 net.ipv4.conf.vrrp6.44.arp_filter = 0 net.ipv4.conf.vrrp6.44.accept_local = 1 net.ipv4.conf.vrrp6.44.rp_filter = 0

The issue in that config is that i am unable to reach the host on its eth0 ip when keepalived is up and its master. No responses to ping etc. I can reach it just fine on the vip.

I can't seem to read in the documentation etc that this is a limitation so i have either made an obvious mistake in my configuration and I apologize for that if this is the case, or hitting a bug.

Any ideas/suggestions ?

Thanks! Regards Alex

pqarmitage commented 1 day ago

You do not need to change any sysctl settings; keepalived changes what is necessary for it to work.

native_ipv6 is not needed in your config.

It is interesting that you are using unicast for IPv4 but standard multicast for IPv6. Is there a reason for that?

I assume that the unicast_src_ip and unicast_peers are also swapped over for the other config.

Is there a reason that you are using vmac_xmit_base?

What version of keepalived are you using - if it is an old version this may relate to a problem we have now fixed? Can you please post the output of keepalived -v.

I have tested this on keepalived v2.3.2 and cannot reproduce the problem. You will need to provide further information, such as the output of ip addr show and ip route show on both systems and also the system you are trying to ping from. You also need to trace your ping packets with something like tcpdump to find where the packets are being dropped/blocked.

zbugrkx commented 1 day ago

Hi!

Thanks a lot for the quick response :). I will try to answer all the questions as best as possible. Once again, apologize if some of my mistakes are due to bad understanding of the manual. I will also clarify a few bits of my configuration/setup.

Both keepalived instances are running on two physical pi's, no virtualization is involved
Raspian OS Bookworm, latest version (that apt gives me at least)
Without use_vmac, the system is working perfectly fine (and has been for a while). I am trying to improve some arp / timeout issues I sometimes get which use_vmac should help with. At least, in my 1 day of testing with it, excluding the access to the host, in vip mode, it has totally improved the issues so I know this would potentially be the right path.
I have removed native_ipv6. This was probably from an old tutorial I had followed (it was tough to find guides about ipv6 with keepalived)
Same to the above, I had some hard time finding the right settings for ipv6, but this is a very good point. I have configured unicast for my ipv6 instances as well using the LLA of both devices (and swapped of course)
vmac_xmit_base: when reading the documentation, it seems to be prefered to use? or i got it wrong. Since my original intent is to fix some arp tables/caching issues, I assumed having some messages sent from the underlying interface would be better? There was also something I read about macvlan discarding some packets if they don't match the right interface. Happy to be corrected on this if the preffered/better way is to not use it.

The version is where we might have an issue! I installed keepalived using apt-get and it seems that 2.2.7 is the latest it is giving me... The issue i'm having might likely be fixed but I have to figure out how I could get a newer version without having to go a too complicated route, especially when it comes to keeping packages updated or risking breaking my setup. Is the install from git described in the repo the only way to get the latest builds?

Thanks!

zbugrkx commented 1 day ago

Small update:

I uninstalled version 2.2.7 from apt and installed the 2.3.2 version from snap. I assume they behave the same, at least, it appears to be.

Keeping my current config + the above changes (still using vmac_xmit_base for now).

With the new version, i'm able to ping the eth0 ip addresses of the Pi from outside devices, but just after restarting the service, I will gradually lose pings/packets for 1/2mins then it resumes to the point where it just barely answers/stops completely, only on the master. The backup is responding 100% Issue follows whichever device becomes master.

I am however unable to have the Pi's ping each others on their eth0 ip addresses. Ping resumes as soon as i disables keepalived.

On the master device, ip route shows:

ip route show default via 10.10.10.1 dev eth0 proto static metric 100 10.10.10.0/24 dev vrrp.44 proto kernel scope link src 10.10.10.20 10.10.10.0/24 dev eth0 proto kernel scope link src 10.10.10.10 metric 100

I'm wondering if the issue is actually right here in the routing table as my subnet is assigned to two interfaces, one going through the vip with the lower metric.

pqarmitage commented 18 hours ago

The route to 10.10.10.0/24 via vrrp.44 is likely to be the problem. The source MAC address or packets from the master will be 00:00:5e:00:01:2c, but that address also exists on the backup. You don't need the routes via vrrp.44, so make sure that the metric of the route via eth0 is lower than the metric via vrrp.44.

zbugrkx commented 18 hours ago

Agree and I was planning to test that but since this route is automatically added by keepalived:

1) is this normal/expected behavior and should this be documented

2) if not, is that something that could be fixed/configurable with a parameter like "allow_host_routing" that would ensure the vrrp interface route gets created with a very high metric or something like that.

zbugrkx commented 18 hours ago

Option 3, would be (probably better). Allow setting the metric of the vmac and/or by default set it higher than the linux default of 100

pqarmitage commented 17 hours ago

keepalived does not add the route on vrrp.44. The route is added by the kernel when 10.10.10.20/24 is added to vrrp.44.

You could try changing the VIP to be 10.10.10.20/32, which would be better since you want the 10.10.10.0/24 interface to still exist on eth0. Alternatively you could configure a route for 10.10.10.0/24 on eth0 with a lower metric than the default. You could at least try doing this manually to see if it resolves the problem.

zbugrkx commented 17 hours ago

This makes a lot of sense and should have thought about that. Especially for my use case where I'm terminating traffic to the hosts, not routing anything so /32 is fine (i was already doing /128 on ipv6....silly me)

Sorry for all that and thanks for the patience and insight !

Closing.

acassen / keepalived

Host on eth0 not accessible when using use_vmac #2512