SEGFAULT when using more than 1 queue

emmericp / ixy

A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch

BSD 3-Clause "New" or "Revised" License

1.2k stars 125 forks source link

SEGFAULT when using more than 1 queue #30

Open cyanide-burnout opened 4 years ago

cyanide-burnout commented 4 years ago

We have issue when we are using than 1 queue on ixgbe

Stack trace: 1 pkt_buf_free 2 ixgbe_tx_batch

Both queues are processed in single thread, we tried to use single mempool as well as mempool-per-queue. The result is the same - any other queue except #0 caught.

marcofaltelli commented 3 years ago

Same problem for me. Fixed it in ixgbe.c, these instructions must be applied for both the RX and TX structures:

declaring MAX_RX_QUEUE_ENTRIES through a #define rather than a const int
in the ixgbe_rx_queue, the *virtual addresses[] field is now void* virtual_addresses[MAX_RX_QUEUE_ENTRIES]
at this calloc(), take away the + sizeof(void*) * ... part

There are also other ways to solve this. However, this is probably the most immediate way. BTW, shout-out to the authors for the nice work!

cyanide-burnout commented 2 years ago

Yes, this patch works very well. And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

emmericp commented 2 years ago

And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

That seems unlikely, can you elaborate on the exact comparison? The batch from the comment above basically turns a flexible array member into a fixed-size array which, since we don't have any bounds checks anyways, should not make a difference for performance...

The only thing that I can think of is false sharing, but

the size of th calloc doesn't change (with the patch it gets it from the now bigger struct vs. a calculation, should be the exact same)
this is not DMA-able memory, so it would only be relevant to multi-threaded code which this bug doesn't mention so far

cyanide-burnout commented 2 years ago

After applying the fix i rewrote my code. Now transmits in multiple threads. 2-4 queues per thread. It’s production system, which can use different methods to accelerate transmission/reception of UDP. Since I wrote several backends to provide transmission by using optimised sockets, PACKET_MMAP, XDP, DPDK and Ixy, we did some bench tests. @stefansaraev did testing, so i would like to ask him to publish results. But anyway the performance of ixy is quite surprised, probably due to buggly implementation of ixgbe by Intel and unnecessarily complicated code. At least I have found some bugs in the code of their implementation of XDP (https://github.com/xdp-project/xdp-tutorial/issues/273).

cyanide-burnout commented 2 years ago

Your bug in two words: wrong interpretation of array access. Just imagine, which address has, for example, rx_queue[1] since the size of struct ixgbe_rx_queue doesn't include actual length of virtual_addresses[] and virtual_addresses is not a pointer to array somewhere else. Typical overlapping, where rx_queue[1] overlaps rx_queue[0].virtual_addresses.

cyanide-burnout commented 2 years ago

I've fixed my comment. Correct explanation is there. 01:30 AM here, I have to be in bed ;)