OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.23k stars 653 forks source link

How can I reduce the latency of the `ec_receive` function ? #795

Open windsgo opened 3 months ago

windsgo commented 3 months ago

I'm using SOEM to drive a Panasonic motor. The following is how I am working with SOEM.

These 2 function cause about 230us.

My Question is:

Many thanks

ArthurKetels commented 3 months ago

SOEM adds very little delay to the cycle. Most of it is used up in the Linux kernel network stack on receive. Optimizing packet receive to user space handover is a topic that is well described on the internet. Your friend is ethtool, see drvcomment.txt.

On the other hand it is not most optimal to send a packet and then wait for it to return (as you are doing). Your situation : - start cycle - send process data - receive process data - calculations - wait for next cycle start - Optimal solution 1 : - start cycle - receive process data - send process data - calculations - wait for next cycle start - Optimal solution 2 : - start cycle - receive process data - calculations - send process data - wait for next cycle start -

Solution 1 optimizes compute efficiency, solution 2 optimizes calculation to setpoint delay.

B.t.w., using CSP is not a very good control mode. You have very little control over velocity and even less over acceleration and jerk.

windsgo commented 3 months ago

Thank you very much for your suggestions and replies. I will try it, about the NIC optimized (really grateful for your suggestions) and the cycle way. Since I'm having some delay problems here, if the 230us delay continues to be a problem, I may have a problem using CSV control. I think CSV may need a higher frequency cycle. Anyway thanks for your suggestions.

I hope that SOEM really adds very little delay to the cycle . But I found a phenomenon that when I use CMake Release build, I got about 127us delay with the ec_xxx functions, which is half of which when I use CMake Debug build (230us as mentioned above). I import SOEM to my project using add_subdirectory(), so SOEM should be affected by the CMAKE_BUILD_TYPE I think. So does this phenomenon imply that there IS some (not so little) delay from the SOEM code itself?

windsgo commented 3 months ago

I'm trying the Optimal solution 1. I called ec_send_progressdata immediately after ec_receive_progressdata, I found a problem that: the ec_receive_progressdata will clear some of my Tx PDO data, specifically speaking, the ControlWord of the motor in the PDO I want to send. I set all the Tx PDO data in the progress calculation in Optimal solution 1. Why this "clear data" behaviour happens?

windsgo commented 3 months ago

I found I need to set all the TxPDO data between the ec_receive_progressdata and ec_send_progressdata, the previous comment is my wrong programming sequence sorry. After I use the Optimal solution 1, the whole peroid takes below 10us latency, EVEN on my generic NIC driver and External USB network device , specifically the minimum is about 5~6us, that's obviously much much better performance than what I used to work.

Thanks again for your optimal solutions. Very appreciate.

windsgo commented 3 months ago

@ArthurKetels Sorry to at you again. I found that when I call ec_receive_progressdata(200) , the timeout value 200us seems not to work. I'm occasionally blocking by this function by above 2000us each time (including the call to ec_send_progressdara, but send should not block too long I think). It seems this timeout param does not work properly?

ArthurKetels commented 3 months ago

This delay is not part of SOEM but the kernel socket recv() function. The socket is created as non blocking and with a maximum delay of 1 us. But is is up to the actual NIC driver to honour this. Do you use the PREEMPT-RT kernel? And if so, what is your priority? Then there are known problems with NMI (for BIOS power management) that can generate latencies up to 2ms.

First check your Linux system for latency performance. Then check how much extra SOEM packs on top of that.

windsgo commented 3 months ago

I'm using PREEMPT-RT kernel under Ubuntu22.04(the official rt-kernel by Ubuntu), thread is scheduled as SCHED_FIFO policy, and priority 99. Also I isolate the cpu 2 and cpu 3 in kernel start-up cmdline(isolcpus=2,3) while set the nohz_full=2,3 (I have logically 8 cpus), and give my "real-time needed thread" an affinity on cpu2. Besides, I write 0 to /dev/cpu_dma_latency just as cyclictest do and it has obvious effect.

When I use cyclicttest (from rt-tests tools), this 2ms delay never occurs (maximum in 24 hours under pressure test is about 100us).

Perhaps this is caused by what you said the NMI problem. So how to do with it ? I have turned off the BIOS options like c-state, Intel SpeedStep and intel speed shift. (I'm using Intel Cpu i7-6700, bios is american megatrends bios I think)

And what do you mean by "check how much extra SOEM packs on top of that"? By the way, this 2ms delay happens in a frequency about 5min to 30min once, not so frequent but also somewhat frequent.

windsgo commented 3 months ago

I've read the function ec_read and I also realized that this may be blocked by the socket recv() function.

/* we use RAW packet socket, with packet type ETH_P_ECAT */
   *psock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ECAT));

   timeout.tv_sec =  0;
   timeout.tv_usec = 1;
   r = setsockopt(*psock, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout));
   r = setsockopt(*psock, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(timeout));
   i = 1;
   r = setsockopt(*psock, SOL_SOCKET, SO_DONTROUTE, &i, sizeof(i));

Seems the socket isnot completely non-block configured. So is it possible to make a socket with flags like SOCK_NONBLOCK or O_NONBLOCK ?

By the way when I use my laptop computer, which has an AMD R7-5800H cpu, this seldom happens XD, is this an intel problem?

windsgo commented 3 months ago

I'd like to add some additional information. I logged when the timeout went large as this 2us. It is always caused in the routine of the ec_x functions I called. Never caused by the function clock_nanosleep.

Neverforgetlove commented 2 months ago

@windsgo Hello,have you already solved this problem? I also encountered the same problem when using it.

toshisanro commented 1 month ago

I have solevd this problem. I use the the Aconist Kernel module atemsys (https://github.com/acontis/atemsys) and develop userspace NIC driver . In my project the latency is less than 50us.

Neverforgetlove commented 1 month ago

@toshisanro Thanks for your reply, it will help me a lot. During my recent testing, I found that after adding a large number of PDOs, sending and receiving data will take up a lot of time, causing the entire communication cycle to be greatly jittered. Have you ever done any relevant tests?

windsgo commented 1 month ago

@windsgo Hello,have you already solved this problem? I also encountered the same problem when using it.

I think the latency is caused by the linux kernel network stack and the generic nic driver. So instead of using SOEM, I tried IGH with modified NIC driver, which works.

Neverforgetlove commented 1 month ago

@windsgo Hello,have you already solved this problem? I also encountered the same problem when using it.

I think the latency is caused by the linux kernel network stack and the generic nic driver. So instead of using SOEM, I tried IGH with modified NIC driver, which works. I think so too. I use the Intel generic nic driver. After adding a large amount of PDO data, the data send and rece will be greatly jittered. But in the nicdrv.c soem used the raw socket , I'm not sure if it's caused by IRQ or other reasons.I haven't use IGH yet, how is it working now?

windsgo commented 1 month ago

@windsgo Hello,have you already solved this problem? I also encountered the same problem when using it.

I think the latency is caused by the linux kernel network stack and the generic nic driver. So instead of using SOEM, I tried IGH with modified NIC driver, which works. I think so too. I use the Intel generic nic driver. After adding a large amount of PDO data, the data send and rece will be greatly jittered. But in the nicdrv.c soem used the raw socket , I'm not sure if it's caused by IRQ or other reasons.I haven't use IGH yet, how is it working now?

SOEM seems to use raw socket with a receive timeout of 1us, which I think may be blocked by the kernel network stack during one of the call to the system call 'recv' in ec_receive_processdata(timeout). IGH has a kernel module which operates the hardware directly through the modified nic driver. It can be used in userspace through the character device interface (ecrt_xxx) provided by the IGH project.

windsgo commented 1 month ago

@windsgo Hello,have you already solved this problem? I also encountered the same problem when using it.

I think the latency is caused by the linux kernel network stack and the generic nic driver. So instead of using SOEM, I tried IGH with modified NIC driver, which works. I think so too. I use the Intel generic nic driver. After adding a large amount of PDO data, the data send and rece will be greatly jittered. But in the nicdrv.c soem used the raw socket , I'm not sure if it's caused by IRQ or other reasons.I haven't use IGH yet, how is it working now?

I think it is not recommended to send and receive too much data through the PDO in realtime and high-frequecy environment. Anyway, I think the original SOEM project is not the best idea on linux, since SOEM itself does not ensure the direct operation to hardware on linux (this must be done with specific kernel modules on linux I think), while IGH is originally designed in the kernel space. For most network device, IGH v1.6.0 stable branch provides modified NIC driver, which is easy to use.

toshisanro commented 1 month ago

@Neverforgetlove I use both the xenomai and Preempt RT for testing, I add six servo drivers about 168 bytes input PDOs and 96 byes outputs , cycle time 1ms with DC. The xenomai's result is better than RT ,RT always cause servo loss DC. Below is xenomai results compared the socket with my userspace NIC driver. ec_receive max latency is 84 us(my driver) ,500us(socket) ,avg less than 50us(my driver), cycle jitters less than 30 us.

toshisanro commented 1 month ago

Additionally, the low-power mode of the CPU may lead to increased latency.

ScreenShot_20240527_095549261
Neverforgetlove commented 1 month ago

@Neverforgetlove I use both the xenomai and Preempt RT for testing, I add six servo drivers about 168 bytes input PDOs and 96 byes outputs , cycle time 1ms with DC. The xenomai's result is better than RT ,RT always cause servo loss DC. Below is xenomai results compared the socket with my userspace NIC driver. ec_receive max latency is 84 us(my driver) ,500us(socket) ,avg less than 50us(my driver), cycle jitters less than 30 us. That sounds great.I have over 700 bytes input PDOs,a large amount of PDO data will cause a large delay in send and rece data, resulting a large cycle jitter(maximum over 200us),this is unacceptable.

windsgo commented 1 month ago

@Neverforgetlove I use both the xenomai and Preempt RT for testing, I add six servo drivers about 168 bytes input PDOs and 96 byes outputs , cycle time 1ms with DC. The xenomai's result is better than RT ,RT always cause servo loss DC. Below is xenomai results compared the socket with my userspace NIC driver. ec_receive max latency is 84 us(my driver) ,500us(socket) ,avg less than 50us(my driver), cycle jitters less than 30 us. That sounds great.I have over 700 bytes input PDOs,a large amount of PDO data will cause a large delay in send and rece data, resulting a large cycle jitter(maximum over 200us),this is unacceptable.

你的通信周期目前是多少?其实可以试一下igh我觉得,我现在igh用户态也是可以用的。用soem就要自己折腾用户态的驱动和内核模块,还蛮麻烦的其实。

Neverforgetlove commented 1 month ago

@Neverforgetlove I use both the xenomai and Preempt RT for testing, I add six servo drivers about 168 bytes input PDOs and 96 byes outputs , cycle time 1ms with DC. The xenomai's result is better than RT ,RT always cause servo loss DC. Below is xenomai results compared the socket with my userspace NIC driver. ec_receive max latency is 84 us(my driver) ,500us(socket) ,avg less than 50us(my driver), cycle jitters less than 30 us. That sounds great.I have over 700 bytes input PDOs,a large amount of PDO data will cause a large delay in send and rece data, resulting a large cycle jitter(maximum over 200us),this is unacceptable.

你的通信周期目前是多少?其实可以试一下igh我觉得,我现在igh用户态也是可以用的。用soem就要自己折腾用户态的驱动和内核模块,还蛮麻烦的其实。

4ms,在codesys上测试有很理想的结果,我不知道它里面做了什么处理,之后我会尝试用igh测试一下

ArthurKetels commented 1 month ago

Please guys, the forum is English language only. Others also want to follow the conversation.