P4ELTE / t4p4s

Retargetable compiler for the P4 language
http://p4.elte.hu/
Apache License 2.0
118 stars 42 forks source link

Broadcasting appears to be broken #46

Open msg-programs opened 2 years ago

msg-programs commented 2 years ago

I tried to get a pure v1model version of the layer 2 switch from the examples (l2switch.p4) to work for the last couple of days. The program kept crashing with the following error:

T₄P₄S switch [...]  exited with error code 139 (C code execution: Segmentation fault)

Other programs worked fine and when I inserted the table entries via the P4RT-Shell myself, the program behaved as expected. The crashes only happened when the switch was trying to broadcast, yet there were no packets emitted by it.

Long story short, I suspect a bug in broadcast_packet(). From my understanding, broadcasting is implemented by checking if the egress port is 100 and then calling that method. It then enumerates the ports, creates copies if needed and then uses send_single_packet() (same file, L. 52) to send the packets out.

When that happens, it sends them out of whatever the egress_port is. At that point, that is still port 100 i.e. the symbolic broadcast port. In my case, this causes issues in dpdk_send_packet(). The packets queue up until MAX_PKT_BURST is reached and are then forcefully sent in a burst, which results in a segfault.

In my case, changing the second argument to the call to send_single_packet() from egress_port to portidx results in correct behavior. I suspect this to be a bug, but as my understanding of the source is very limited, I could be wrong and this fix could have unintended side effects. Here is the modified broadcast_packet() function that works for me:

void broadcast_packet(int egress_port /* unneeded  */, int ingress_port, LCPARAMS)
{
    uint8_t nb_ports = get_port_count();
    uint32_t port_mask = get_port_mask();

    uint8_t nb_port = 0;
    int broadcast_idx = 0;
    for (uint8_t portidx = 0; nb_port < nb_ports - 1 && portidx < RTE_MAX_ETHPORTS; ++portidx) {
        if (portidx == ingress_port) {
           continue;
        }

        bool is_port_disabled = (port_mask & (1 << portidx)) == 0;
        if (is_port_disabled)   continue;

        packet* pkt_out = (nb_port < nb_ports) ? clone_packet(pd->wrapper, lcdata->mempool) : pd->wrapper;
        send_single_packet(pkt_out, /*egress_port*/ portidx, ingress_port, broadcast_idx != 0, LCPARAMS_IN);

        ++nb_port;
        ++broadcast_idx;
    }

    if (unlikely(nb_port != nb_ports - 1)) {
        debug(" " T4LIT(!!!!,error) " " T4LIT(Wrong port count,error) ": " T4LIT(%d) " ports should be present, but only " T4LIT(%d) " found\n", nb_ports, nb_port);
    }
}

I did some very limited testing to see if setting the port to any invalid value causes the same issue. Sending packets to port 11 or 31 results in the error lcore 2 called tx_pkt_burst for not ready port [...] and a stack trace, while sending some to port 33, 99 or 101 causes a segfault. Sending packets to port 32 has a weird side effect, where printing the port in the function dpdk_send_packet() prints 56786. This also results in a segfault. It could be that this is another special port that I'm not aware of.

Setup

Running in a Ubuntu 21.04 VM, 4 cores, 8 GB RAM

examples.cfg:

l2switch_v1model                    arch=dpdk hugepages=512 model=v1model smem vethmode pieal piports dbg verbose p4rt

with pieal and piports defined as

pieal   -> ealopts += -c 0xc -n 4 --no-pci --vdev net_pcap0,rx_iface_in=veth0-s,tx_iface=veth0-s,tx_iface=veth0-s --vdev net_pcap1,rx_iface_in=veth1-s,tx_iface=veth1-s,tx_iface=veth1-s
piports -> cmdopts += -p 0x3 --config "\"(0,0,2)(1,0,3)\""

veth0-s and veth1-s are veth devices that are set up as follows:

The P4 program is linked below. It's basically the l2switch.p4 from the example directory, but without the preprocessor macros that make it work with both available architectures. I had to rename the file, as GitHub doesn't support uploading .p4 files for some reason.

l2switch_v1model.p4.txt

slaki commented 2 years ago

Thanks for the reported issue. Please check out the bugfix branch. The broadcasting issue has already been fixed there, and after a code review it will be merged with the master.

Naibaowjk commented 1 year ago

code review it will be merged with the m

but you dont merge after half year, good job