OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.3k stars 668 forks source link

Suggestions on Improving the reliability of EtherCat master #642

Closed WillyTuring closed 1 year ago

WillyTuring commented 1 year ago

Hi Arthur, Hi Folks,

At the moment i'm running EtherCat loop at 4ms interval, and i'm seeking for suggestions on how to reduce the interval from 4ms down to 1ms or even lower if possible, the minimum interval the slave supports is 250us.

More specifically, i want to know for all of you guys who have reached this level of reliability, do you guys use the innate Linux OS or other OS or there are some other "secretive ways" exist?

Thanks in advance for the help.

ArthurKetels commented 1 year ago

I have run SOEM, on bare metal (no OS) micro-controllers, absolutely reliable above 10kHz. With that I mean up time larger than a year, packet loss << 1e-10, jitter << 1us.

As a comparison a modern PC system (standard or industrial) is horrendously complex, over-powered, and extremely bureaucratic. It is like asking a big city to organize a family diner for you. You could end up hungry, with a truck load of food at the wrong place, and in jail for not following the rules.

Take for example the interrupt latency of modern PC microprocessors from Intel and AMD. Often they take north of 10K cycles to switch to an interrupt routine. And if you include cache trashing it can go up to 100K cycles. That is more than what the whole control program of many SOEM applications take. Data flow in PC hardware is build on queues, ordering, serialization. RT control requires fast reaction. PC hardware is simply not designed for real-time applications.

Then what to do if you are stuck with PC hardware and Linux? The simple answer is; strip everything you do not need.

First select hardware that is as close as you need for the job at hand. Select a processor that can be run continuously at highest P-state. You do not want to switch to lower P-states to keep the CPU from overheating. Select a performance that matches your needs, it is no use burning power in idle loops. Do not have more cache levels than needed, they increase latency. If there is hardware that is not needed, make sure it can be disabled. For example, in some laptops temperature sensors and battery gauge sensors can cause unavoidable latency in the ms range. Oh, and please select a platform with a decent clock.

Second, strip all functionality from the OS that you do not need. Take a vanilla PeemptRT-kernel from linux.org and configure it with only the modules you need. The less you can get away with the better the performance will be. Then compile the kernel and install it. Systems without USB / Graphics / Serial ports / RAID / Crypto will perform better. What you minimally need is support for : CPU, memory, storage, timing, and wired network.

With the above strategy I have had systems with SOEM running at 5kHz without issues.

Look at it as if transitioning from a camper van (high utility) to a race car (performance).

WillyTuring commented 1 year ago

Hi Arthur, Thanks for such an informative suggestion as always. What you said about "stripping the-no-needed off the OS" for helping with the "Real-time performance" makes a lot of sense.

From what you suggested, what i've gathered are, please correct me if i'm wrong: 1.First, tackle the elephant in the room through re-configuring Linux OS with PeemptRT-kernel patch with minimum modules needed. 2.Second, deal with different parts that might cause latencies like sensors.

Do you think allocate one or two specific CPU cores to the Soem master process will help or not? And also do you have any resources that you could point me to for tackling the problems above, if not i think i will have to google them one by one then.

WillyTuring commented 1 year ago

Hi Arthur, Any thoughts on the previous follow-up question? And just a quick update on the control. image 👆Here's the velocity graph, unit in rad/s, the "x-axis" is the EtherCat cycle. image 👆Here's my torque command graph, unit is Percentage of rated torque, the "x-axis" is the EtherCat cycle.

I'm doing control with feedback in every EtherCat cycle, I feel like the torque command curve could be better, but now trying to seek a way that can help me lock in the final position precision first which is more important, i'm wondering what suggestions do you have on improving and ensuring the final position precision?

nakarlsson commented 1 year ago

@WillyTuring , can we close this issue? Suggestions have been given.