flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.8k stars 2.87k forks source link

flannel add multicast support - PIMv2 adjacency #200

Closed greenpau closed 7 years ago

greenpau commented 9 years ago

PIMv2 Hello messages issued from flunnel0 interface of a host do not reach the host's peers. It seem that the packets are getting dropped.

I suspect that it is related to the following lines of ip_route_input_slow() function in linux kernel.

1694         if (ipv4_is_zeronet(saddr))
1695                 goto martian_source;
1696 
1697         if (ipv4_is_zeronet(daddr))
1698                 goto martian_destination;

When flanneld starts, it assigns x.x.x.0 ip address to the flannel0 interface.

http://lxr.free-electrons.com/source/include/linux/in.h?v=2.6.32#L277

Is there a way to start/change addressing of flannel0 interface to a non-zero address?

greenpau commented 9 years ago

there will not no match:

return (addr & htonl(0xff000000)) == htonl(0x00000000);

disregard, closing.

greenpau commented 9 years ago

re-opening.

Added static route to send all multicast to flannel0 interface:

# ip route show to match 224.0.0.13
default via 192.168.16.1 dev eth0
224.0.0.0/4 dev flannel0  scope link
#

Nevertheless, PIMv2 Hello messages do not reach the other side because they are being dropped by the kernel.

3  30.030581  10.252.63.0 -> 224.0.0.13   PIMv2 30 Hello
4  30.030616  10.252.63.0 -> 10.252.63.0  ICMP 56 Destination unreachable (Network unreachable)

The id in PIMv2 Hello and ICMP Destination unreachable match.

The ENETUNREACH - The network of the given addr isn’t reachable from this host.

There are two options. It is triggered by either fib_lookup() fails or ipmr_rule_action()

greenpau commented 9 years ago

perhaps, it is related to flanneld's code in flannel/backend/udp/proxy.c:

./backend/udp/proxy.c:static void send_net_unreachable(int tun, char *offender) {
./backend/udp/proxy.c:      send_net_unreachable(tun, buf);
greenpau commented 9 years ago

It is likely that the packet to 224.0.0.13 is not matched by find_route() function:

        next_hop = find_route((in_addr_t) iph->daddr);
        if( !next_hop ) {
                send_net_unreachable(tun, buf);
                goto _active;
        }
greenpau commented 9 years ago

tun_to_udp() sends exactly one packet to next_hop.

        sock_send_packet(sock, buf, pktlen, next_hop);

For PIMv2 adjacency to work, it must send that packet to all peers.

greenpau commented 9 years ago

let's try adding extra logging:

        if( !next_hop ) {
                log_error("No next hop for %s\n", inet_ntoa(*(struct in_addr *)&iph->daddr));
                send_net_unreachable(tun, buf);
                goto _active;
        }

After adding the above log line, flanneld reports:

# journalctl -u flanneld --reverse
-- Logs begin at Wed 2015-06-03 16:25:07 UTC, end at Fri 2015-06-05 15:50:17 UTC. --
Jun 05 15:50:17 ip-192-168-16-146.inf.ise.com flanneld[11216]: No next hop for 224.0.0.13
Jun 05 15:49:48 ip-192-168-16-146.inf.ise.com flanneld[11216]: No next hop for 224.0.0.13

that's progress :+1:

greenpau commented 9 years ago

keeping this open to submit PR for multicast.

eyakubovich commented 9 years ago

@greenpau flannel does not support multicast. See my response in https://github.com/coreos/flannel/issues/179 for details.

greenpau commented 9 years ago

@greenpau flannel does not support multicast. See my response in #179 for details.

@eyakubovich , I am getting up to speed with Go and remembering some C while learning your code. There are some great techniques in it!

I was able to exchange multicast PIMv2 Hello between flannel peers. (I failed to create adjacency though). My goal is to watch/intercept local multicast traffic for IGMP membership reports from flannel peers and maintain a table with peer-subscription relationships.

Currently, I am using http://weave.works/ for overlay and it supports multicast. However, I would like to make flannel work with multicast because, in my humble opinion, flannel's overlay is better implemented.

P.S. pointer math with route reallocation code hurts :+1:

lemenkov commented 9 years ago

@greenpau what exactly are you trying to achieve? It seems that you're trying to run something like Avahi on top of the flannel. Is that correct?

Just for those who might be interested - in some topology cases, it's possible to broadcast Avahi changes across the hosts w/o adjusting flannel. Just use proper network topology.

greenpau commented 9 years ago

@lemenkov, I included docker0 and flannel0 interfaces to pimd PIM-SM daemon .

I want every node on my flannel network to receive packets destined to 224.0.0./24

it definitely requires code change, e.g. modified tun_to_udp() in proxy.c:

        iph = (struct iphdr *)buf;

        if ( ( ntohl(iph->daddr) & 0xffffff00) == 0xe0000000 ) {
                iph->ttl++;
                for (i = 0; i < peers_cnt; i++)  {
                        sock_send_packet(sock, buf, pktlen, &peers[i]);
                        log_error("local multicast packet from %s to %s was sent to TBD (%d)\n",
                                inaddr_str(iph->saddr, saddr, sizeof(saddr)),
                                inaddr_str(iph->daddr, daddr, sizeof(daddr)), i);
                }
        } else if ( ( ntohl(iph->daddr) & 0xf0000000) == 0xe0000000 ) {
                log_error("detected multicast packet destined for %s, dropping ...\n",
                                inet_ntoa(*(struct in_addr *)&iph->daddr));
                send_net_unreachable(tun, buf);
                goto _active;
        } else {
                /* log_error("%s is not a multicast destination\n",
                                inet_ntoa(*(struct in_addr *)&iph->daddr)); */

                next_hop = find_route((in_addr_t) iph->daddr);

When I receive a packet destined to 224.0.0.24, I increment TTL, because they have TTL=1. Then, I unicast that packet to all my flannel peers.

For now, I do not forward non-local multicast.

greenpau commented 9 years ago

Receiving sending local multicast.

Jun 08 12:56:22 ip-192-168-16-147.inf.ise.com flanneld[18471]: sent local multicast packet from 10.252.93.0 to 224.0.0.1 via 192.168.16.146
Jun 08 12:56:24 ip-192-168-16-147.inf.ise.com flanneld[18471]: sent local multicast packet from 10.252.93.0 to 224.0.0.22 via 192.168.16.146
Jun 08 12:56:24 ip-192-168-16-147.inf.ise.com flanneld[18471]: sent local multicast packet from 10.252.93.0 to 224.0.0.13 via 192.168.16.146
Jun 08 12:56:25 ip-192-168-16-147.inf.ise.com flanneld[18471]: sent local multicast packet from 10.252.93.0 to 224.0.0.2 via 192.168.16.146
Jun 08 12:56:26 ip-192-168-16-147.inf.ise.com flanneld[18471]: received packet for 224.0.0.1 from 10.252.63.0 via 192.168.16.146
Jun 08 12:56:27 ip-192-168-16-147.inf.ise.com flanneld[18471]: received packet for 224.0.0.22 from 10.252.63.0 via 192.168.16.146
Jun 08 12:56:30 ip-192-168-16-147.inf.ise.com flanneld[18471]: received packet for 224.0.0.13 from 10.252.63.0 via 192.168.16.146
Jun 08 12:56:34 ip-192-168-16-147.inf.ise.com flanneld[18471]: received packet for 224.0.0.2 from 10.252.63.0 via 192.168.16.146

However, on the receipt, the packets are not getting to pimd. See https://github.com/troglobit/pimd/issues/49

greenpau commented 9 years ago

update: the adjacency was successfully formed with 669db13

greenpau commented 9 years ago

ну и наверное можно перейти на русский :smile_cat:

erandu commented 8 years ago

@greenpau do you have solve multicast between containers running on two different nodes? I have tested with your build and it doesn't work, so i am wonder where does the problem come from.

greenpau commented 8 years ago

@greenpau do you have solve multicast between containers running on two different nodes? I have tested with your build and it doesn't work, so i am wonder where does the problem come from.

@erandu , as far as I remember I did solve it. There are two things you should consider:

  1. must have multicast daemon running
  2. must have iptables NAT/mangle rules. Consider the following, when mutlticast packets hit the remote host (not a container), the packet is handled by the kernel network stack. You need to tell the kernel to send the mutlicast packets to container or containers. hint: iptables -t mangle
tomdee commented 7 years ago

Tracking multicast under #179 so closing this issue.