OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.31k stars 670 forks source link

EtherCAT Communication keeps dropping #443

Closed venkisagunner93 closed 3 years ago

venkisagunner93 commented 4 years ago

The following is my setup:

Application I wrote using SOEM, works fine for the most part. Sporadically, the connection between the EtherCAT master and the slaves drop causing some undesirable behavior. I would like to know what are some potential reasons for this behavior and also I just wanted to know whether there are any parameters that we can relax in the SOEM library to solve connection dropout issue? (Like a connection timeout value or something like that).

Please note that I may not have provided all of the information. I can definitely get and provide more information if pointed out. I will definitely update this issue whenever I identify useful information regarding the problem.

ArthurKetels commented 4 years ago

Connection on the physical layer is nothing SOEM can control. It is the NIC on the master and PHY on the slave that establish and keep connection. SOEM just taps in on a RAW socket provided by Linux. Link up or down is monitored in the Linux NIC device driver.

The way SOEM works is to transmit a packet and wait for a limited time for a response. If nothing gets in then transmit again until the timeout value is passed or a valid packet has been read.

There are numerous reasons why a packet does not return to the master: 1) No packet is actually transmitted over the wire from master to slave because dropped in stack (should not happen). 2) Packet can not be transmitted because there is no physical link established (link down). 3) Packet is transmitted but is corrupted by bad electrical connections. 4) Packet is transmitted but is corrupted by EMI. 5) Packet is not received by slave because of closed port on slave ESC. 6) Packet is not transmitted by slave because of same reasons 1..4 above. 7) Packet is not transmitted by slave because of power drop / reset in slave. 8) Packet is transmitted on wrong port on slave due to force open of unused ESC port. 9) Packet is received by master but is dropped in the network stack.

You have to remember that each packet in your set-up (4 slaves) has to make 8 hops to return to the master. If one of them fails, nothing is returned.

Do you have single packet loss or is it a long disconnect (like half a second)? A long disconnect mostly indicates a link loss and reconnect cycle.

For detailed help you should capture data traffic with wireshark and post zipped .pcap files here. This way we can analyze the packet traffic and timing. Very long captures (>1 min) should be cut around the observed drop outs.

For individual packet loss you can read out the error counters available in each EtherCAT slave.

venkisagunner93 commented 4 years ago

Great. I was thinking of same route to capture the packets. My bash script didnt capture tcpdump. Let me try to fix my script and get the dump and analyze. Thanks for the help. I will keep you posted on this issue.

ArthurKetels commented 4 years ago

tcpdump only works with sudo. You need raw capture privileges.

monroe-git commented 3 years ago

@venkisagunner93 @ArthurKetels Hi, I'm a EtherCAT newbie and could I ask something about my setting? I tested 'slaveinfo' on the two experiments setup. I think I couldn't use the SDO with EPOS4. How can I use EPOS4 with SOEM?

  1. Experiment 1
    • Master PC
    • Ubuntu 16.04
    • Xenomai 3.0.5
    • SOEM 1.4
  1. Experiment 2
    • Master PC: same as above

These are the result of the 'eepromtool'. sii_epos4.txt sii_maxpos.txt

ArthurKetels commented 3 years ago

@monroe-git , Your slaveinfo output does not show anything problematic. EPOS4 does not support CoE dictionary, but this is not mandatory. I suggest to first study basic EtherCAT protocol. If you have no idea how to control your slave and how to properly build an EtherCAT master based on SOEM this library is probably not for you.

monroe-git commented 3 years ago

@ArthurKetels Thank you for the answer. But @venkisagunner93 already controlled the motors with EPOS4. And I also tested the slave(MAXPOS) based on SOEM library, Here is re-mapping part of the code.

os=sizeof(ob2); ob2 = 0x1600;   //RxPDO
//0x1c12 is Index of Sync Manager 2 PDO Assignment (output RxPDO)
wkc_count=ec_SDOwrite(k+1, 0x1c12,01,TRUE,os, &ob2,EC_TIMEOUTRXM);  
if (wkc_count==0)
{
    printf("RxPDO assignment error\n");
    return FALSE;
}
os=sizeof(ob2); ob2 = 0x1a00;   //TxPDO
//0x1c13 is Index of Sync Manager 3 PDO Assignment (input TxPDO)
wkc_count=ec_SDOwrite(k+1, 0x1c13,01,TRUE,os, &ob2,EC_TIMEOUTRXM);
if (wkc_count==0)
{
    printf("TxPDO assignment error\n");
    return FALSE;
}
venkisagunner93 commented 3 years ago

@monroe-git You can use SDO with EPOS4 and SOEM. Can you please tell me what is the problem you are facing exactly? And also are you sure that you have EPOS4 EtherCAT version (something like this: https://www.maxongroup.com/maxon/view/product/control/Positionierung/628094). Also if you can point me to your code, I can take a look. Btw, this issue is only regarding "EtherCAT communication drops".

@ArthurKetels I tried the steps you mentioned, captured packets and saw them using wireshark. I believe I saw some packet drops which I'm not sure whether SOEM can fix. Also, I have two separate ports (an SFP+ and a RJ45). With ethercat connected through RJ45 and other peripherals connected through SFP+, I was not able to witness any connection drops. That tells me, probably there is an operating system issue rather than a SOEM issue.

Additionally, I would like to ask you a question regarding timeouts mentioned in ec_send_processdata() and ec_receive_processdata(). If I see the issue again, can I relax the timeout value? If so, how much would you recommend?

venkisagunner93 commented 3 years ago

@monroe-git You can send me your github repo link here. I will take a look.

ArthurKetels commented 3 years ago

@venkisagunner93 , about timeouts. The maximum effective timeout is the addition of two timings, a) the return time of a packet going through all slaves and back again to the master, b) the maximum latency between receiving the packet and the processing in ec_receive_processdata().

Time a is almost fixed for a specific slave configuration and measured by SOEM. It will not deviate more than a few hundred nanoseconds. A quick rule of thumb is 350ns per slave plus the transmission time of the packet (at 100Mb/s). Time b is more difficult to determine and depends a lot on your hardware and operating system. Bare metal implementations can reach maximum latency below 1us. But even real-time Linux can be as high as 200us. An even harder problem are outliers, they do not happen often but can be extreme. It all depends on your definition of effective latency.

To determine the optimal timeout value for your system will take very rigorous and elaborate testing.

nakarlsson commented 3 years ago

@venkisagunner93 , can we close this issue?

venkisagunner93 commented 3 years ago

Issue still persists. Im not sure whether i have to do anything with my operating system or the hardware in which I’m running SOEM. Let’s close this for now. I will keep an eye and lets see whether I can reliably reproduce the issue.