Optimizations for achieving higher cycletimes

ICube-Robotics / ethercat_driver_ros2

Hardware Interface for EtherCAT module integration with ros2_control

https://icube-robotics.github.io/ethercat_driver_ros2/

Apache License 2.0

125 stars 32 forks source link

Optimizations for achieving higher cycletimes #101

Open PieterjanCottignies opened 6 months ago

PieterjanCottignies commented 6 months ago

I'm trying to control several motors with CSP8-mode and have a trajectory generator integrated into an controller. To have good behaviour I should update the motor at about 1kHz.

With some opimizations to the NIC-settings I am able to run this at 1Khz without much problems but now and then I do see some state changes occur on the slaves. When looking into this I noticed that these are likely delays of up to multiple ms propagating from the ROS 2 control RT-loop into the EtherCAT-driver. These delays cause noise to occur on the motors and seem to reduce the system stability.

Are there other people running at higher control loops that have / had the same issues and know some further optimizations to be done to achieve more stable system behaviour?

Current optimizations performed:

NIC: turn off delays / buffering algorithms / ...
Isolated core for ROS 2 RT-control loop
Assign interrupt handlers of NIC to isolated core that is the same as the ROS 2 control loop.

Apart from these optimizations done, I think the core issue of the delays in the ROS 2 control loop propagating to the interface handler can only be avoided by putting the function resposible for reading / writing to / from slaves on a different RT-Thread.

To achieve this I would think to put the Run function into an RT-Thread that just keeps sending / receiving the current domain to / from the slaves. The Read and Write function would then update this domain. The newly written information then gets sent to the slaves on the next cycle of the RT-Communication thread's cycle. I also think this is how it is done in e.g. the SOEM-EtherCAT library and I think also in the SimpleECAT library this is based on.

Is there anyone that can confirm / provide more background information about the design behind this or if this is something that as of now was no need for because of higher cycle times and so on?

wjx1991 commented 5 months ago

I'm trying to control several motors with CSP8-mode and have a trajectory generator integrated into an controller. To have good behaviour I should update the motor at about 1kHz.

With some opimizations to the NIC-settings I am able to run this at 1Khz without much problems but now and then I do see some state changes occur on the slaves. When looking into this I noticed that these are likely delays of up to multiple ms propagating from the ROS 2 control RT-loop into the EtherCAT-driver. These delays cause noise to occur on the motors and seem to reduce the system stability.

Are there other people running at higher control loops that have / had the same issues and know some further optimizations to be done to achieve more stable system behaviour?

Current optimizations performed:
* NIC: turn off delays / buffering algorithms / ...

* Isolated core for ROS 2 RT-control loop

* Assign interrupt handlers of NIC to isolated core that is the same as the ROS 2 control loop.
Apart from these optimizations done, I think the core issue of the delays in the ROS 2 control loop propagating to the interface handler can only be avoided by putting the function resposible for reading / writing to / from slaves on a different RT-Thread.

To achieve this I would think to put the Run function into an RT-Thread that just keeps sending / receiving the current domain to / from the slaves. The Read and Write function would then update this domain. The newly written information then gets sent to the slaves on the next cycle of the RT-Communication thread's cycle. I also think this is how it is done in e.g. the SOEM-EtherCAT library and I think also in the SimpleECAT library this is based on.

Is there anyone that can confirm / provide more background information about the design behind this or if this is something that as of now was no need for because of higher cycle times and so on?

I also have the instability issue. When i change the update frequency as 1kHz, one of my motors switches between enabled and disnabled states frequently. When the frequency is 100Hz, there is no such issue. Could you please provide more information on how to performs your the three optimizations? I don't know how to do that in the code.

PieterjanCottignies commented 5 months ago

@wjx1991 Underlying problem will not be fixed by these optimizations as they require changeing of the EtherCAT driver source code but they improve the stability a lot.

NIC delay turnoff: Check the ethtool in Linux in specific the coalescing settings. These are throughput optimizations which are holding messages back what we definetly don't want with EtherCAT communication.

Isolated cores: This is done through changeing the grub settings that get applied to the kernel on startup. More info can be found in a cannonical blog post on RT-behaviour optimization.

Assigning IRQ's to same core: This I have done through a startup script when starting the EtherCAT master to look for specific NIC IRQ's and set the affinity to the isolated core running the EtherCAT master and driver.

wjx1991 commented 5 months ago

@wjx1991 Underlying problem will not be fixed by these optimizations as they require changeing of the EtherCAT driver source code but they improve the stability a lot.

NIC delay turnoff: Check the ethtool in Linux in specific the coalescing settings. These are throughput optimizations which are holding messages back what we definetly don't want with EtherCAT communication.

Isolated cores: This is done through changeing the grub settings that get applied to the kernel on startup. More info can be found in a cannonical blog post on RT-behaviour optimization.

Assigning IRQ's to same core: This I have done through a startup script when starting the EtherCAT master to look for specific NIC IRQ's and set the affinity to the isolated core running the EtherCAT master and driver.

sorry for delays. thank you very much.

yguel commented 4 months ago

Hi Pieterjan,

To be able to help you, we would like to have more information concerning the computer configuration on which your application is running. I understand that you use ubuntu with RT-preempt enabled, am-I correct ? From our experience, at such frequency, there is no guarantee, even with a separated thread that you will get no latency in the command loop except if you use an OS with strict RT guarantees like Xenomai (https://xenomai.org/). The maximum frequency we tested our systems with ubuntu was 100Hz, when we need higher frequency we used Xenomai.

PieterjanCottignies commented 4 months ago

Hi Yguel,

I'm indeed using an RT-patched Ubuntu minimal installation. This is running on an i5 IPC with 6cores. I also optimised the OS for realtime behaviour (BIOS settings / core isolation / ...). The cyclictest testtool shows latencies consistently under 100us even under 100% system load so I think the realtime behaviour of the system is ok.

I also mapped the ROS 2 control realtime loop onto an isolated core aswel as the EtherCAT kernel module and the interrupt threads of the binded NIC. This in all shows good realtime performance and 1kHz does not seem to be a problem to achieve.

The only think I saw was that due to the Read and Write function controlling the frame release of the EtherCAT communication that any loads inside an controller or driver has a direct impact on this and thus increases the jitter on the frame release. This would sometimes lead to slave state changes. To achieve a more stabel EtherCAT communication I was thinking of moving the communication and frame control to its own realtime thread and then have the Read and Write functions just get the latest data and return it.

It is a suggestion to improve performance and maybe communication stability in general.

I also have other suggestions for improvement and can / want to help with the implementation but the CONTRIBUTING.md told to open issue for feature / bugs / ... so I wanted to wait your response before continuing.

yguel commented 4 months ago

Hi Pieterjan, We would like to reproduce your setup. Can you give us the maximum of data about your project. We could also try with your project if you can share it with us.

Here are a list of questions that may help us find the source of the problem:

1 - What ethernet driver do you use ? Especially if you use the generic ethernet driver when configuring the IgH EtherCAT master, we have observed that it may not be optimized for RT. If you can switch to one of the driver supported by Etherlab it will be closer to what we know works well. 2 - Can you described which tests (load and duration) you have performed to verify that your RT config is correct ? 3 - What is your kernel version, ubuntu version and ROS2 version ? 4 - What amount of data is exchanged on the bus at 1KHz ?

You use the correct way to raise your problem and if I have not answered earlier, it is because of an error from my part.

PieterjanCottignies commented 4 months ago

Hi yguel,

Below the information.

1: I indeed use the generic driver, I will look into if I can use another one to test. 2: To verify the realtime behaviour of the system I used the cyclictest I let it run for about 1h and use a tool linke stress-ng to load the cpu cores to 100% before drawing conclusions.

I also captured a tcpdump to analyse the frame timings that get sent out by the EtherCAT driver. It was here that I noticed that there is some jitter on the timings that the frames get released. Jitter I defined here as the deviation from the optimal frame release time (previous cycle start + cycletime). Deviation was an average of around

3: Currently running kernel 5.15.119-rt65 but have more or less the same results with version 6.1.77-rt24, Ubuntu 22.04.3 LTS, ROS2 Humble 4: Data amounts to 150 Bytes rx- and tx-PDO together that gets exchanged cyclical. Should easilly fit in one frame.

The test setup is running 4 motors in csp8 mode at an 1kHz update interval. I'm searching for jitter because I can sometimes hear vibration noises on the motor and am searching for the origin.

As the sending of the frames is coupled to the triggering of the Read and Write function of the ROS 2 rt-loop I was wondering if this could be an issue and if it would be better to put the frame releases on a different rt-thread and let the Read and Write function just do data exchange from the domains for the ROS 2 control interfaces. This was the origin of my question. I also notice some slave state changes like shown below that might have something to do with this aswel:

[ros2_control_node-1] Domain: WC 0.
[ros2_control_node-1] Domain: State 0.
[ros2_control_node-1] Domain: WC 12.
[ros2_control_node-1] Domain: State 2.

yguel commented 4 months ago

Thank you for the details.

[ros2_control_node-1] Domain: WC 0.
[ros2_control_node-1] Domain: State 0.
[ros2_control_node-1] Domain: WC 12.
[ros2_control_node-1] Domain: State 2.

Those messages say that the working counter (WC: a counter that increments during the EtherCAT cycle each time a slave get the EtherCAT message frame) was 0, meaning no slaves got the datagram (No registered process data were exchanged). This says that there was an other problem than the RT problem we try to investigate, since a delay/jitter should not prevent a message to circulate correctly on the bus. This may be a problem with the driver of the ethernet card: so the first think would be to test with the dedicated driver for your ethernet card. Normally, you will have more meaningful messages from your kernel logs, that may help us find the cause of the problem. When you look at the output of the ''dmsg'' command, can you find logs related to EtherCAT missing frames ?

The correct State is 2 (All registered process data were exchanged), so I assume that normally you have 12 slaves on your EtherCAT bus, am-I correct ?

Other possibilities to feed your research:

The problem might come from the motor interface electronics. Are the drives using CAN over EtherCAT ? Someone in the team experienced problems like that while making tests at 1kHz without using ROS. It was not related to RT problems, since he used xenomai and the stress tests were good when he was not using the drives. The problem was coming from the drive themselves. I will try to compile his setup to make that problem report available to the package users.
an other potential source of RT problem to eliminate is your own controller. You probably already done that so I mention it for completude. When you had jitter, were you testing with or without your controller in the loop to verify that the problem comes not from there ?

Side note, I would like to make a documentation to help users investigate the kind of problems you had. It looks like you have done good and extensive tests. It will help greatly people to have guidelines to debug such annoying problems. If you are interested by such document, I would greatly appreciate if you can give me feedbacks.

PieterjanCottignies commented 4 months ago

Hi,

After initial setup no acyclic SDO access is done to the slaves.
I was preparing a simple test where I would apply a sine wave to a single motor and monitor the following error and so on in the hopes of finding out if the source is application or driver related. I'll also test with a kernel module built for the specific network driver.

Additional documentation on how to debug common issues sounds nice, I can provide feedack for sure or help write it if you want (maybe dm me / start a discussion thread for this?).

What I have done so far:

added timing statistics to my controller to inspect the timings of the ROS 2 rt-loop
added timings to the triggering of the Read and Write functions of the ros2_ethercat_driver to check for timings on when we would trigger a frame release
took a tcpdump of the ethercat frames to check the jitter on the frame release
Optimized NIC settings (turned off coalescing, ...)

-> Conclusion of these tests seem to be that there is some jitter (+- 400us max) on the frame release and timings but with an acceptable average around +-85 us which is still well withing cycle time. Further analysis and different test cases under different loads still to be performed before I can be sure of my findings.

side question: for exposing diagnostics and statistics such as the master / slave / domain state and timings from the driver, how would you think this is best done? At the moment I made an extra ROS 2 node that uses a queue from the realtime thread to publish data on a diagnostics topic. It just feels a bit like working around the ROS 2 control framework doing it this way so I was wondering if you know a common / better way of doing this.

Output of the dmesg:

[19216.293852] EtherCAT: Requesting master 0...
[19216.293861] EtherCAT: Successfully requested master 0.
[19216.486231] EtherCAT 0: Domain0: Logical address 0x00000000, 116 byte, expected working counter 12.
[19216.486238] EtherCAT 0:   Datagram domain0-0-main: Logical offset 0x00000000, 116 byte, type LRW.
[19216.486261] EtherCAT 0: Master thread exited.
[19216.486264] EtherCAT 0: Starting EtherCAT-OP thread.
[19216.486616] EtherCAT 0:  binding thread to cpu 3
[19216.486629] EtherCAT WARNING 0: 1 datagram UNMATCHED!
[19217.609445] EtherCAT 0: Domain 0: Working counter changed to 2/12.
[19217.970384] EtherCAT 0: Slave states on main device: OP.
[19218.610434] EtherCAT 0: Domain 0: 7 working counter changes - now 12/12.
[19227.421439] EtherCAT 0: Domain 0: Working counter changed to 0/12.
[19227.486376] EtherCAT WARNING 0: 7 datagrams TIMED OUT!
[19227.486381] EtherCAT WARNING 0: 7 datagrams UNMATCHED!
[19228.422433] EtherCAT 0: Domain 0: Working counter changed to 12/12.
[19236.993434] EtherCAT 0: Domain 0: Working counter changed to 0/12.
[19237.486378] EtherCAT WARNING 0: 4 datagrams TIMED OUT!
[19237.486385] EtherCAT WARNING 0: 4 datagrams UNMATCHED!
[19237.994438] EtherCAT 0: Domain 0: Working counter changed to 12/12.
[19244.931437] EtherCAT 0: Domain 0: Working counter changed to 0/12.
[19245.486374] EtherCAT WARNING 0: 4 datagrams TIMED OUT!
[19245.486380] EtherCAT WARNING 0: 4 datagrams UNMATCHED!
[19245.932432] EtherCAT 0: Domain 0: Working counter changed to 12/12.
[19247.454438] EtherCAT 0: Domain 0: Working counter changed to 0/12.
[19247.486379] EtherCAT WARNING 0: 3 datagrams TIMED OUT!
[19247.486385] EtherCAT WARNING 0: 3 datagrams UNMATCHED!
[19248.455431] EtherCAT 0: Domain 0: Working counter changed to 12/12.
[19249.456443] EtherCAT 0: Domain 0: 2 working counter changes - now 12/12.
[19249.486374] EtherCAT WARNING 0: 7 datagrams TIMED OUT!
[19249.486380] EtherCAT WARNING 0: 7 datagrams UNMATCHED!

-> This would lead me to believe that frames were skipped and I should thus indeed try the dedicated driver and check the NIC settings?

yguel commented 4 months ago

Hi, The messages in dmsg look strange, so far we never had: TIMED OUT followed by UNMATCHED. As far as we understand:

TIMED OUT happens when there is problem on the bus (like shutdown or failing slave or broken cable)
UNMATCHED happens when a frame with incorrect id is read, for instance if a new frame is sent by the master and an old frame is received successively. Which would be coherent with a delay problem. but maybe something has changed in the master and if the delay is superior to the loop frequency, it raises a TIMED OUT.

It looks interesting though that the number of datagrams always match in TIMED OUT and UNMATCHED. The delay is around 6µs (min,max) = (5µs,7µs). Which is way below 1ms.

You could have more detailed kernel messages if you enable debug messages in the IgH EtherCAT master using the following command:

ethercat debug 1

What kind of ethernet controller do you have ? What is the native driver for your ethernet controller ? Depending on your kernel version and on the Ethernet controller you have, IgH may not have implemented a native driver. You get a compatibility matrix here: https://docs.etherlab.org/ethercat/1.5/doxygen/devicedrivers.html

For statistics, that is a tough ros2_control question ! I think you would be forced to work around the ros2_control framework since as any kind of framework with hard RT constraints, it is not designed to be used while logging since it would seriously degrade the performances. Your approach seems good to me. Someone in the team made some performance benchmark using realtime_tools::RealtimePublisher You can have a look at: https://github.com/mcbed/r2c_perf_bmark/blob/master/bmark_controllers/src/bmark_controllers.cpp where that benchmark was implemented.

PieterjanCottignies commented 4 months ago

HI,

Thanks for the feedback.

I see its an I210 based intel chip running the igb driver at the moment if I'm not mistaken. I also see 5.15 kernel version supports the igb driver, thanks for the link!

I'm trying to build a new kernel module with the igb driver enabled but having some issues at hte moment so I'll continue this tomorow / next week . I'll also check the kernel messages then.

I can investigate a lot more already with your guidelines and I will give an update when I have more findings. Thanks for the help!

For the logging bit I'm using a single producer consumer queue at the moment to tunnel the data out of the rt-thread. I started with the rt-publishers but found the queue a bit easier. Once out of the rt-context I then use a normal node to do some processing and publish some diagnostics. Felt a bit hacky so interesting to see how other people tackled this.

PieterjanCottignies commented 3 months ago

Update from my side.

I tried the igb driver but could not get a connection to the slaves anymore then. I see in the dmesg output that it found the interface and uses it but something else is probably not setup right or I need another driver. I will take a further look later.

I also did some further testing including a long running test with timings of the triggering of the Read() and Write() functions:

jitter is described as the deviation to the ideal cycle time (prev + 1ms).

Measurement	Value (ms)
read_points	56913355 cycles
cycle_read_Avg	0.999782
cycle_read_Max	1.111706
cycle_read_Stddev	0.002101
jitter_read_Avg	0.001008
jitter_read_Max	0.116172
cycle_write_Avg	0.999732
cycle_write_Max	1.493422
cycle_write_Stddev	0.008850
jitter_write_Avg	0.004942
jitter_write_Max	0.503350

These seem to be ok for our purpose so I don't think I have an timing issue. Frame grab over 1000000 frames also resulted in the following:

Measurement	Value
Number of frames	1000000
Average frame release time	1.000 ms
Max frame release time	2.024 ms
Average receive delay	0.043 ms
Max receive delay	1.017 ms

I had 1 frame 'overrun' during this collection period (delay > cycletime). This however is not blocking and the software can handle 1 skipped cycle like this.

I still have the datagram warnings but these do not seem to coincide with the perceived jitter (jitter on motor being modem like noises). I believe its still jitter because the torque and controller effort show spikes during this.

Further investigation currently leads me to the controlling model (are all updates from the trajectory generator neatly passed to the ethercat_driver, ...) and the DC-Sync implementation. In the DC-Sync implementation I did see the following:

clock_gettime(CLOCK_REALTIME, &t);
ecrt_master_application_time(master_, EC_NEWTIMEVAL2NANO(t));
ecrt_master_sync_reference_clock(master_);
ecrt_master_sync_slave_clocks(master_);

The 'CLOCK_REALTIME'is used instead of the CLOCK_MONOTONIC. This clock can experience NTP SYNC events shifting the clock which will have bad effects on slave operation. The example implementation of the igh master also shows the usage of the 'CLOCK_MONOTONIC'. This might explain some sudden stopping of the slaves after 1 hour or so because I'm now running it for almost a day without much problems (but still jitter).

So I'm investigating futher and if I find something I'll share again but at the moment it does not seem correlated to the ethercat_driver implementation. Thanks for the help and further suggestions are still welcome.