facebookincubator / katran

A high performance layer 4 load balancer
GNU General Public License v2.0
4.75k stars 504 forks source link

[katran][healthchecking]: allow to mangle source ip #218

Closed tehnerd closed 10 months ago

tehnerd commented 10 months ago

Right now for ipip healtchecks we are using balancer's source ip. While for data packets (the one which would be actually load balanced) we are using specially crafted (mangled) source IPs to play nice with NIC's RSS. However there could be unfortunate scenarios when backend has some kind of FW rules installed which allow packets from internal IPs (e.g. from 10/8) but do not allow packets from say mangled space (which by default is 172.16/16). In such unfortunate events we could end up w/ load balancer thinks that backend is healthy (because healthchecks are passing just fine; as 10/8 is permitted from firewall point of view) but actually data packets are being dropped / blackholled by FW (because 172.16/16 is not allowed)

This diff creates a common functions which could be reused both in HC and Balancer itself to enable healtchecks source mangling. As well as introduces new compile time flag which enables this feature (so it is no-op for default setup) for ipip healthchecks

Tested by: default katran_tester's UTs are passing as is (both for balancer and healthchecker)

w/ MANGLE_HC_SOURCE define set HC sources became from the "mangled" space

10:19:24.279152 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 43)
    192.168.1.1.31337 > 10.200.1.1.80: [udp sum ok] UDP, length 15

# Mangled v4 src
10:19:24.279155 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto IPIP (4), length 63)
    172.16.119.76 > 10.0.0.1: IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto UDP (17), length 43)
    192.168.1.1.31337 > 10.200.1.1.80: [udp sum ok] UDP, length 15

10:19:24.279160 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto TCP (6), length 55)
    192.168.1.1.31337 > 10.200.1.1.80: Flags [.], cksum 0x27e4 (correct), seq 0:15, ack 1, win 8192, length 15: HTTP

10:19:24.279162 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto IPIP (4), length 75)
    172.16.119.76 > 10.0.0.2: IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto TCP (6), length 55)
    192.168.1.1.31337 > 10.200.1.1.80: Flags [.], cksum 0x27e4 (correct), seq 0:15, ack 1, win 8192, length 15: HTTP

10:19:24.279169 IP6 (hlim 64, next-header TCP (6) payload length: 35) fc00:2::1.31337 > fc00:1::1.80: Flags [.], cksum 0xfd4f (correct), seq 0:15, ack 1, win 8192, length 15: HTTP

#Mangled v6 src
10:19:24.279173 IP6 (hlim 64, next-header IPv6 (41) payload length: 75) 100::697a:1337 > fc00::1: IP6 (hlim 64, next-header TCP (6) payload length: 35) fc00:2::1.31337 > fc00:1::1.80: Flags [.], cksum 0xfd4f (correct), seq 0:15, ack 1, win 8192, length 15: HTTP
tehnerd commented 10 months ago

Hey @avasylev do you folks still accept external patches or? There are 0 activity from FB folks in all recently opened PRs

avasylev commented 10 months ago

of course we do, just sometimes get caught between things, I'll take care PRs in next couple days.

facebook-github-bot commented 10 months ago

@avasylev has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 10 months ago

@avasylev merged this pull request in facebookincubator/katran@43889f1932c4e7d79e55985e3aa1f4fa35197320.