OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.33k stars 676 forks source link

ec_receive_processdata() is blocking for several milliseconds on Debian 11 but not on Debian 10 #612

Closed lbckmnn closed 2 years ago

lbckmnn commented 2 years ago

Hi, I recently upgraded from Debian 10 (with installed Debian RT Kernel 4.19.0-17-rt-amd64) to Debian 11 (5.10.0-14-rt-amd64) resulting in some strange behavior in two different applications:

  1. The ec_receive_processdata() function is blocking for up to 10ms resulting in real-time issues
  2. ec_receive_processdata often returns -1 (10-20 times per second at a 500Hz ethercat cycle)

One Application has Problem 1.) and the other Application has Problem 2.)

I created a simple benchmark program to debug this. It is basically the same as simple test. The difference to the simple tests are:

I ran the application on a system with Debian 10 and on a System with Debian 11 for 10 minutes on a few Beckhoff terminals while also running stress --cpu 4 --io 2 --vm 2 --vm-bytes 128M --hdd 2. The command from drvcomment.txt was also applied.

The Network interface in use seems to be a Realtek 8111g. Unfortunately problem 2) could not be reproduced as there were only two WKC mismatches but problem 1) is visible in the histogram:

Debian 10 (4.19.0-17-rt-amd64) Debian 11( 5.10.0-14-rt-amd64)
ethercat_receive_Debian_10 ethercat_receive_Debian_11

I tried the same program with a cheap USB Ethernet Converter. This works just fine (with a higher latency of course,) so i guess the problem is the network driver?

also cyclictest on both systems does not indicate any problems.

Is there anything i can do about this?

ArthurKetels commented 2 years ago

I have noticed this change in behaviour during the Linux kernel progression. The timing problem is created by how Linux internally handles the socket receive buffer. If you dig deeper in your timing you will see the latency increase is mostly in the recv() function. It's internal handling has changed to optimize the interrupt handling, but at the cost of latency. You can get better timing by setting the socket to non-blocking: Original in nicdrv.c

   /* we use RAW packet socket, with packet type ETH_P_ECAT */
   *psock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ECAT));

Modified

   /* we use RAW packet socket, with packet type ETH_P_ECAT */
   *psock = socket(PF_PACKET, SOCK_RAW | SOCK_NONBLOCK, htons(ETH_P_ECAT));

However this will lead to busy polling on the SOEM side and increasing CPU a lot. You decide what behaviour suits your application best.

I am experimenting with the use of ppol(). It has a timeout capability that uses the hrtimer instead of jiffies. Looks good on my platform. But it needs to be tested on all that is out there. Often what I though to be improvements turn out to be worse on some platforms.

See also #605 and #451

ArthurKetels commented 2 years ago

Another remark, do not set RT priorities over 49. The kernel priority is 50. Setting your task priority higher will lead to starvation of internal kernel processes that you depend on (f.e. socket handling). It is possible to do it on isolated CPU's though, but then again, any priority level will suffice there.

lbckmnn commented 2 years ago

Thank you very much for the answer. I tried both, just adding SOCK_NONBLOCK and the solution from: https://github.com/OpenEtherCATsociety/SOEM/issues/451#issuecomment-1049649671 (ppoll with local DONT_WAIT in receive). The later solution seems to be the better one. I can provide a patch file or PR if needed or welcome.

However, there is still some strange behavior i can't really explain: I lose significantly more frames with a real network interface than with an USB adapter: I ran the same test as above (on Debian 11), but now with a real-time priority 40 and for 30 minutes.

USB Ethernet Adapter RTL8111g over mini PCIe
ethercat_receive ethercat_receive
dropped frames: 10 dropped frames: 161

I will look into this with wireshark but i don't think these frames are actually lost.

Regarding the IRQ priorities: What is your opinion on setting the priority of the Ethernet IRQ Handler to something higher than 50?

ArthurKetels commented 2 years ago

I think you are suffering from Interrupt coalescing, You can use the ethtool tool to disable it. And for a nice write-up on packet latency see: https://blog.cloudflare.com/how-to-achieve-low-latency/ It would be nice to see how far you can drive latency down. The graphs you present are very informative.

lbckmnn commented 2 years ago

@ArthurKetels I already executed the ethtool -C eth0 rx-usecs 0 rx-frames 1 tx-usecs 0 tx-frames 1 from drvcomment.txt shouldn't that disable interrupt coalescing?

ArthurKetels commented 2 years ago

Some newer NIC drivers use other parameters. Use ethtool -c to list the options for your NIC. Play around a bit to figure out what works and what doesn't.

lbckmnn commented 2 years ago

The Output of ethtool -c eth0 is:

Coalesce parameters for eth1:
Adaptive RX: n/a  TX: n/a
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a

rx-usecs: 0
rx-frames: 1
rx-usecs-irq: n/a
rx-frames-irq: n/a

tx-usecs: 0
tx-frames: 1
tx-usecs-irq: n/a
tx-frames-irq: n/a

rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a

rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a

All options with n/a seem to be unsupported and can not be changed with -C. I will try some options from your second link. edit: I tried setting the cpu affinity of the RT thread to one cpu only but that does not seem to make a big difference.

ArthurKetels commented 2 years ago

Hmm, I know the kernel has undergone some significant changes around IRQ handling around 5.10 (and is still ongoing). My suggestion is to build your own preempt-rt patched kernel. It is not that difficult. Only build those features that your really need and turn off everything else. Take the latest from kernel.org (not the Debian patched) that is supported by preempt-rt patch. I got very good results from those home build kernels.

There are many blogs posted on the internet about optimizing latency with tweaked kernels. Low hanging fruit is for example the video driver. Do not use nvidia drivers (nouveau is kinda ok). Kick out all task and socket governors.

Setting CPU affinity only helps for your task, not for kernel related latency.

lbckmnn commented 2 years ago

I found this: https://www.spinics.net/lists/linux-rt-users/msg23900.html This seems to be the same problem. For now I ended up just installing the buster Kernel (4.19.0-17-rt-amd64). This produces basically the same plots as in my very first post and also drops no frames. The Kernel seems to work just fine with Debian 11 userland.

I would be very interested in a .config if someone manages it to build a 5.x Kernel which doesn't have these problems. Also thank you very much for your help.