OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.24k stars 653 forks source link

ec_receive_processdata() takes too long #739

Closed millerfield closed 9 months ago

millerfield commented 10 months ago

Our setup is a SOEM master on a devboard with EtherCat connected to two different slaves. We use the simple_test.c from the SOEM test/linux directory and modified it to measure some timing behaviour. It seems, that receiving of the EtherCat frame takes longer than necessary. This leads to not reaching the performance out of EtherCat like driving a cycle with 10kHz.

The functions ec_send_processdata and ec_receive_processdata are instrumented with setting/resetting GPIO pins on the devboard to measure timing behaviour with a logic analyser or oscillocope.

for (i = 1; i <= 100000; i++)
            {
               gpiod_line_set_value(output_pink, 1); // Set pink
               ec_send_processdata();
               gpiod_line_set_value(output_pink, 0); // Reset pink
               gpiod_line_set_value(output_yellow, 1); // Set yellow
               wkc = ec_receive_processdata(EC_TIMEOUTRET);
               gpiod_line_set_value(output_yellow, 0); // Reset yellow

The thread containing the function simpletest() is separated into it's own realtime thread on a dedicated CPU 3 with priority 80 and SCHED_FIFO set as scheduling policy. The Linux kernel with PREEMPT_RT patch runs on an imx8 with 4 cores.

uname -a
Linux verdin-imx8mp-14753712 5.15.77-rt53-6.3.0-devel+git.3592e06b1bda #1-TorizonCore SMP PREEMPT_RT Tue Aug 9 12:56:10 UTC 2022 aarch64 GNU/Linux

Additional Kernelbootargs to provide more realtime:

isolcpus=3, irqaffinity=0,1,2

Prestartup command:

ethtool -C ethernet0 rx-usecs 0 rx-frames 1 tx-usecs 0 tx-frames 1

After startup of simpletest, the waveforms on a scope looks like that: send_vs_receive

The waveform in pink shows the duration of the ec_send_processdata and yellow shows ec_receive_processdata. The send function returns after 24µs, this seems to be ok. The receive function blocks about 500µs. The timeout for receive is set to the standard 2000µs. But wouldn't it be expected that returning 1015 Bytes from the EtherCat should last about 100µs at 100MBit/s ?

Thx Stephan

millerfield commented 10 months ago

Additional information of debug output:

SOEM (Simple Open EtherCAT Master)
Simple test
running as root under PREEMPT_RT
Starting simple test
rtthread: cpu set to 3
ec_init on ethernet0 succeeded.
2 slaves found and configured.
Slaves mapped, state to SAFE_OP.
segments : 1 : 1015 0 0 0
Request operational state for all slaves
Calculated workcounter 6
Operational state reached for all slaves.
Processdata cycle    1, WKC 6 , O: 00 00 00 00 00 00 00 00 I: 01 9c 0f 00 00 00 00 00 T:747407214467583720
ArthurKetels commented 10 months ago

Thanks for the high quality data. The data looks similar to other boards I have seen with an imx8. The NIC driver is not particularly optimized for real-time latency. My use case did not require high cycle rates so I never got to dissecting the driver issue.

For comparisons, I have run SOEM on a cortex-m4 single core board at 144mHz, where it achieved 25kHz without any issues. The SOEM related latencies for Tx and Rx are below 10us on an imx8. The rest is due to kernel context switching and driver latencies.

To loop 1000bytes of data through the hardware you can calculate it as : (payload(bytes) byte-encoding(1byte = 10 encoded bits) bit-time ) + ((1 master + 2 slave) PHY-delay) (1000 10 1e-8)+(3 350 * 1e-9) => 110us. Here we forget about cable propagation delay at 4.7ns/m. yes, 500us Rx delay seems to be a bit much.

Can you check with half the payload? Then you know how the latency is load dependent.

millerfield commented 9 months ago

Thanks Arthur, here the same setup but with 511 Bytes processdata.

SOEM (Simple Open EtherCAT Master)
Simple test
running as root under PREEMPT_RT
Starting simple test
rtthread: cpu set to 3
ec_init on ethernet0 succeeded.
2 slaves found and configured.
Slaves mapped, state to SAFE_OP.
segments : 1 : 511 0 0 0
Request operational state for all slaves
Calculated workcounter 6
Operational state reached for all slaves.
Processdata cycle    1, WKC 6 , O: 00 00 00 00 00 00 00 00 I: 01 99 0f 00 00 00 00 00 T:747732383270228400

The duration of ec_receive_processdata is now about 340µs:

Bildschirmfoto vom 2023-09-08 16-46-44

So it seems to be load dependent.

millerfield commented 9 months ago

I missed to mention, that there is an additional network switch between the master and the slaves. It is needed for observing the frames with a connected PC running Wireshark. Bypassing the switch leads to a reduction of latency of about 100µs. The duration of ec_receive_processdata without network switch: 511 Bytes ~ 240µs 1015 Bytes ~ 280µs

I think this is still too much. The next step is figure out the pass through delay of both the slaves.

ArthurKetels commented 9 months ago

You know the EtherCAT hardware that is used in the slaves? By any chance a TI Sitara microcontroller? If you can not peek inside the slave, a read of slave register 0x0000 will also do. The vaules according to the datasheet:

Beckhoff: First terminals : 0x01 ESC10, ESC20: 0x02 First EK1100: 0x03 IP Core: 0x04 Terminals: 0x05 ET1100: 0x11 ET1200: 0x12 Other vendors: Hilscher: 0x80-0x88 TI: 0x90 Test: 0xEE

millerfield commented 9 months ago

Slaves are an ASIX AX58200 Devboard and a simple output terminal with Vendor-ID 0x59, so no Sitara. The measured delays do not differ no matter whether two or one slaves are used. 9 Bytes ~ 200µs There seems to be an offset of about 200µs added by the imx8 NIC driver.

ArthurKetels commented 9 months ago

The Asix chips have similar timing as the ET1100 from Beckhoff. Depending on PHY used delays are 200 - 400ns. The slaveinfo test program will show you the exact measured propagation delay for a slave.

It should be a little less than 200us. For a packet of 9 bytes payload there are 24 bytes of protocol overhead, 12 bytes of intergap space between packets. And ,for small packets, padding to reach a length of 64 bytes.

Happy hunting for the delay in the NIC driver!

P.s. : For some time I wanted to use the new IOring API to poll the Rx ringbuffer of the NIC in userspace. This would cut out most of the driver and network stack for packet receive. But I have no idea if your platform supports this API for NIC. For bare-metal SOEM solutions, direct access to the NIC ringbuffers is the default, as there is no NIC driver at all.

millerfield commented 9 months ago

Understood, the slave chips are not designed to add huge latencies of 200µs to the communication. Slaveinfo shows a delay of 0ns for the first slave (AX58200) and 700ns for the second slave (OCE11C). So slave propagation delays are less than 1µs and can be neglected. So I need to dive into NIC driver related issues when having time. I have no clue whether the Linux drivers are supporting the IOring API and how to do that. But thanks a lot for your help, so can I close and conclude this issue for you ?

Kind regards Stephan