Closed jespersmith closed 7 years ago
Depends on the niddrv implementation, what OS are you using? Most nicdrv drivers have a mutex guarding tx of a frame that would cause the blocking. Depending on OS you can get rid of the mutex by making some variables local stack variables.
There shouldn't be any problem with multiple datagrams on the line, if you add a switch between the master and 1st slave you could verify if the WKC rely is wrong or if it get trashed in SOEM or dropped by the OS.
We're running this under Linux (Intel e1000e NIC, kernel 3.18 with rt preempt)
On a VxWorks I've run a PDO task in paralllel with a SDO task, both with a resolution of 1ms and the PDO task with higher prio. No problems, that would indicate that this issue is in the NIC area from nicdrv to the e1000 driver. As I suggested earlier, make use of a switch to figure out what happens.
Ok so I did some more research. I used different network cards (including a cheap USB dongle) with no difference in behavior. Instead of a switch I used Wireshark, no actual packets seem to be lost.
Observations
Ok, I'd try to make nic_drv TX/RX reentrant without mutexes and see how it works.
For now I have put the ethercat state control in the same thread as the controller, running no more than one ethercat transaction per control cycle. It only needs to do stuff at startup, after that it stays idle till we get WKC errors (which should never happen).
We're running up to some major deadlines and I won't have time to test much till November. I would like to revisit this issue then.
I had similar issues before with Linux and the e1000 driver. The NAPI layer will coalese multiple received frames until a configurable time-out has happened. Leading to receive time-outs in SOEM. Disabeling all irq mitigation mechanisms in the driver will improve things a lot.
@jespersmith can this issue be closed? From SOEMs point of view all tuning possibilities have been mentioned.
I made the controller single threaded. I'm closing this issue for now.
So I have implemented my master to have two threads, the main EtherCAT loop and the householding thread. The EtherCAT loop runs send/receive in a realtime context with proper timing. The householding does state checking for ethercat slaves as in the following psuedocode (it does more, but this is the main part)
I have two issues with this system: 1) The access SDO sometimes blocks or times out the cyclic PDO exchange. This happens on the Elmo Twitter Slave. 2) I'm developing a new slave using the Infineon XMC4300 and the Beckhoff SSC. If multiple slaves are connected and I read the ECT_REG_DCSYSDIFF In parallel the slave will loose datagrams and refuse to go in OP mode.
I'm wondering if reading registries in parallel is acceptable and what happens if a datagram is on the line and I send a new one before it gets back to the master. It seems that datagrams get dropped or I get wrong expected working counters when this happens.