SDO read/write and state machine from different threads

jespersmith commented 8 years ago

So I have implemented my master to have two threads, the main EtherCAT loop and the householding thread. The EtherCAT loop runs send/receive in a realtime context with proper timing. The householding does state checking for ethercat slaves as in the following psuedocode (it does more, but this is the main part)

while(true)
{
if(not switched to OP OR expected working counter is not received working counter)
readstate(0)
for each slave
 if(slave in SAFE_OP)
  read ECT_REG_DCSYSDIFF
  if ECT_REG_DCSYSDIFF < 5000
    switch slave to OP
  end if
end if
end for
for each SDO I want to access
  access SDO
end for

sleep(10ms);
}

I have two issues with this system: 1) The access SDO sometimes blocks or times out the cyclic PDO exchange. This happens on the Elmo Twitter Slave. 2) I'm developing a new slave using the Infineon XMC4300 and the Beckhoff SSC. If multiple slaves are connected and I read the ECT_REG_DCSYSDIFF In parallel the slave will loose datagrams and refuse to go in OP mode.

I'm wondering if reading registries in parallel is acceptable and what happens if a datagram is on the line and I send a new one before it gets back to the master. It seems that datagrams get dropped or I get wrong expected working counters when this happens.

nakarlsson commented 8 years ago

Depends on the niddrv implementation, what OS are you using? Most nicdrv drivers have a mutex guarding tx of a frame that would cause the blocking. Depending on OS you can get rid of the mutex by making some variables local stack variables.

There shouldn't be any problem with multiple datagrams on the line, if you add a switch between the master and 1st slave you could verify if the WKC rely is wrong or if it get trashed in SOEM or dropped by the OS.

jespersmith commented 8 years ago

We're running this under Linux (Intel e1000e NIC, kernel 3.18 with rt preempt)

nakarlsson commented 8 years ago

On a VxWorks I've run a PDO task in paralllel with a SDO task, both with a resolution of 1ms and the PDO task with higher prio. No problems, that would indicate that this issue is in the NIC area from nicdrv to the e1000 driver. As I suggested earlier, make use of a switch to figure out what happens.

jespersmith commented 8 years ago

Ok so I did some more research. I used different network cards (including a cheap USB dongle) with no difference in behavior. Instead of a switch I used Wireshark, no actual packets seem to be lost.

Observations

Increasing the timeout on ecx_receive_processdata to something high like 4000 will result in no dropped packets and no invalid working counters.
However, the jitter on the control thread becomes pretty big and slaves will have synchronization errors. It seems that the SDO/readstate/FPRW calls result in large delays on the control thread.
The issue is also evident with the Beckhoff slaves I have here. However, they go into OP mode anyway so I stop calling statecheck and reading ECT_REG_DCSYSDIFF.
Putting mutexes around send/receive and the PDO tasks will avoid packets lost. Even if the mutex is seperated between send and receive. However, this comes at the cost of higher jitter.

nakarlsson commented 8 years ago

Ok, I'd try to make nic_drv TX/RX reentrant without mutexes and see how it works.

jespersmith commented 8 years ago

For now I have put the ethercat state control in the same thread as the controller, running no more than one ethercat transaction per control cycle. It only needs to do stuff at startup, after that it stays idle till we get WKC errors (which should never happen).

We're running up to some major deadlines and I won't have time to test much till November. I would like to revisit this issue then.

ArthurKetels commented 8 years ago

I had similar issues before with Linux and the e1000 driver. The NAPI layer will coalese multiple received frames until a configurable time-out has happened. Leading to receive time-outs in SOEM. Disabeling all irq mitigation mechanisms in the driver will improve things a lot.

nakarlsson commented 7 years ago

@jespersmith can this issue be closed? From SOEMs point of view all tuning possibilities have been mentioned.

jespersmith commented 7 years ago

I made the controller single threaded. I'm closing this issue for now.

OpenEtherCATsociety / SOEM

SDO read/write and state machine from different threads #55