OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.23k stars 653 forks source link

How to check if the interface is disconnected? #809

Closed open04 closed 2 weeks ago

open04 commented 1 month ago

Hi, Im trying to check every cycle if my interface is disconnected. Here's some ways that I already tried.

  1. In my processdata thread, I always checked the return of ec_receive_processdata()
    
    bool interface_disconnected = false;

void processdata() { while(1) { ec_send_processdata(); if(ec_receive_processdata() > 0) printf("Good"); else { interface_disconnected = true; break; } osal_usleep(1000) } }

But when I tested it, it does not work, the processdata is still on-going even if I disconnect the interface to the host PC.

2. By checking the sample codes (simple_test), I tried the is_lost function

bool interface_disconnected = false;

void processdata() { while(1) { ec_send_processdata(); ec_receive_processdata() > 0 printf("Good"); if(ec_slave[0].is_lost) { interface_disconnected = true; break; } osal_usleep(1000) } }


Im having an error (sorry I didnt capture the error) after disconnecting the interface.

3. [Haven't tried yet, but I dont want this method]

bool interface_disconnected = false;

void processdata() { while(1) { ec_send_processdata(); ec_receive_processdata() > 0 printf("Good"); for(int i=1; i < ec_slavecount; i++) { if(ec_slave[i].is_lost) interface_disconnected = true; else interface_disconnected = false; }

if(interface_disconnected ) break; osal_usleep(1000) } }


Havent tried yet since it is weekends, I just got the feeling that the error Im getting on number 2 is because I set the slave number to 0. Im dealing with multiple slaves (around 10 slaves) and I thought setting the slave_number to 0 means like Im checking all of the slaves.

Is there any other function or method that I can try?
ArthurKetels commented 1 month ago

The proper way is to check the workingcounter (WKC) that is returned from ec_receive_processdata(). The master needs the counter with value 0 and each slave that communicates adds 1 2 or 3 to the WKC. It is then a simple check to see how many slaves have responded. If none respond you get either 0 or -1 (no packet received, time-out).

open04 commented 1 month ago

Thanks Arthur, I forgot to add the flag checking on my main thats why the script does not exit. It works now!

open04 commented 1 month ago

Hi Arthur, I reopen this issue since I think this is somewhat related from my first question.

I implemented the checking of workcounter in my program, but even before or after I implemented it, I noticed that my board is having an error when I idled in about ~1.5hr.

Scenario:

  1. I run my own program.
  2. NMT state is on OPERATIONAL.
  3. FSA state is also on Operation
  4. send / receive processdata() is on different thread and is running.
  5. Print workcounter of receive_processdata().
  6. Error checking (ecatcheck()) is not implemented on my program.
  7. No commands is set on PDO or SDO, just constantly reading input PDOs.
  8. Board's red LED blinks after around ~1.5h
  9. workcounter is -1.

I dont know if Idle is the correct term because Iam constantly sending and receiving processdata(), but I noticed that this always happen when Im not sending PDO/SDO commands in ~1.5hr.

ArthurKetels commented 1 month ago

A workcounter return value of -1 indicates that no return packets from the slave have arrived at the master during the timeout period. Now the question is : is this a single event, a burst or continuous?

Each of the above can have multiple root causes. So you have to dig deeper. See what happens with the ifconfig tool. Look for link stratus or dropped packets. Have a look with Wireshark to see what happens with packets.

Soem itself is unlikely to be cause. I have had systems continuously running for over a year.

ecatcheck is a simple example of how you could implement some error recovery. It can also help finding the cause of communication errors. But it will not catch all issues.

For example, I once had a system where I got disconnected from the slaves after some time. It turned out to be a network deamon that checked for regular traffic and brought the line down if it found none to save power. The problem is that EtherCAT is no normal traffic (not TCP or UDP) so not recognised by the demaon. Removing the deamon from the system solved the issue.

open04 commented 1 month ago

Thanks Arthur,

Regarding with the ecatcheck(), is it okay to merge the checks in my processdata thread? Im limiting the thread creation in my program.