OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.37k stars 690 forks source link

Error after going into operation mode #364

Closed VitalSys closed 4 years ago

VitalSys commented 4 years ago

I am able to get the simple test go all the way to operation mode, but after few millisecond I get the error 8303 on the Omron 1S drives front panel: Communications were not established consecutively because the synchronization with the EtherCAT Master could not be achieved.

My RT Thread timing is pretty accurate and the PDO is being sent at the right interval and time within couple of microseconds. My cycletime is 1ms.

While the PDOs are being exchanged, as soon as I run these two lines below, I get the 8303 error on the drives: ec_slave[0].state = EC_STATE_OPERATIONAL; ec_writestate(0);

My question is that in order to configure the PDOs, is this all I have to do for the servo drives: uint16 map_1c12[2] = { 0x0001, 0x1701 }; uint16 map_1c13[2] = { 0x0001, 0x1B01 };

ec_SDOwrite(slave, 0x1c12, 0x00, TRUE, sizeof(map_1c12), &map_1c12, EC_TIMEOUTSAFE); ec_SDOwrite(slave, 0x1c13, 0x00, TRUE, sizeof(map_1c13), &map_1c13, EC_TIMEOUTSAFE);

Do I have to configure any other item to complete the PDO config? I need the drives to use the Cyclic synchronous position mode. I believe the problem is not with timing, rather I am missing something in the PDO configuration.

I am fighting with this error for several weeks now. Will appreciate any insight so I can get out of this problem situation. Attached is the my code for simple test. Thank you. -Rufi

Ethercat_SimpleTest.txt

ArthurKetels commented 4 years ago

Thanks for sending the code. This is taking a lot of the guesswork out.

I see you want to do the right thing. But unfortunately synchronizing with ethercat is a bit more involved. On the PDO configuration I can not help because I do not have the documentation of the drive. Therefore I do not know the mandatory setup to configure for cyclic synchronous mode.

But I do spot some troubles in your test code.

First of all when in DC mode all slaves need time to distribute the clock among themselves and synchronize to the reference clock. In this case the first DC capable slave. This can be done best in safe-OP. Either wait a few seconds, or better, read the ECT_REG_DCSYSDIFF register of the last DC slave in the chain. When the time difference is below your minimum threshold for a certain time you are sure all slaves are locked in to the reference clock.

Second you also need to synchronize your master to the reference clock. In your code you do this only once at start-up. But your master clock will drift with respect to the slave reference clock. Look at the sample code red_test.c to see how you could solve this. In that example a simple PI controller is used to calculate the time offset between reference slave and master. This offset is then added to the PDO exchange timing.

Third, most synchronous slaves want proof that the master is sending the PDOs at acceptable time intervals. And they check that there is one and only one PDO exchange per sync0 cycle. To do this you have to do multiple cycles at correct synchronization while in safe-OP Only then an internal flag is set in the drive and the master is allowed to request OP mode. Some slaves have this flag in their PDO ,others do not, and then you have to guess a reasonable time to wait.

Fourth you need some timing margin between PDO transfer and sync0 event. When both happen at approximately the same time then it could happen that you end up with either no PDO exchange in the sync0 cycle or two. Both situations violating the checks in the slave (see above). I see you have put 0.9x cycle time of initial offset in the test code. This leaves 100usec margin.

Do not assume that the time your software sends the datagram is the same time as when the slave receives the datagram. And only the last counts. And it is easy to check because in DC mode the PDO exchange also reads out the slave reference clock. In red_test.c this return value is used in the synchronization loop. Thereby correcting for all unknown timing errors in the master stack and hardware.

My advise is to instrument all timings and clocks to see how your system behaves. Logging the timing of both master and all slaves is very insightful as to how all of this interacts.

VitalSys commented 4 years ago

Thanks Arthur for the insight. I will research your recommendations and report back soon.

One question, is the slave reference clock read by PDO exchange, is this the latched time on the first slave at sync0 event, or just the clock value at the time of sending the datagram from slave?

ArthurKetels commented 4 years ago

The datagram issued each PDO cycle is a read-multiple-write to ECT_REG_DCSYSTIME. This is the local slave clock. It reads the value of the first slave and writes the result to all following slaves. This is the method ehtercat uses to distribute the clock. But a side effect is that when the packet returns to the master it too can read the clock of the reference slave.

The clock gets captured at the moment the packet arrives at the first slave. The sync0 signal is not relevant. Because all of this is done in hardware the timing is very accurate (+-20ns).

ArthurKetels commented 4 years ago

A link to a public available poster from ETG that shows the basic working of EtherCAT: https://www.ethercat.org/download/documents/EtherCAT_Device_Protocol_Poster.pdf It also has a section about DC and SYNC

VitalSys commented 4 years ago

hello Arther, I checked the items you mentioned, but still no luck. As soon as I enable operation mode, within 2 or 3 pdo cycles, the error is generated. I verified that my PDO timing is spot on as I am now comparing with slave 1 clock using the ec_dctime and syncing the master clock. As far as the timing, it is perfect, Tx PDO landing in the middle of the cycle time of the slave. Here is the output of the code:

ec_config_init 0 Copy SII slave 3 from 2. Copy SII slave 4 from 2. 4 slaves found and configured. Found[1]: R88D-1SN01H-ECT EEPMAN 83 ID AE Found[2]: R88D-1SN04H-ECT EEPMAN 83 ID B0 Found[3]: R88D-1SN04H-ECT EEPMAN 83 ID B0 Found[4]: R88D-1SN04H-ECT EEPMAN 83 ID B0 Set PRE OP State............

PreOP

OMRON slave 1 set, retval = 3 OMRON slave 2 set, retval = 3 OMRON slave 3 set, retval = 3 OMRON slave 4 set, retval = 3 Configure DC options for every DC capable slave found in the list ec_config_map_group IOmap:0132CE20 group:0

Slave 1, configadr 1001, state 02 CoE Osize:96 Isize:224 Slave 2, configadr 1002, state 02 CoE Osize:96 Isize:224 Slave 3, configadr 1003, state 02 CoE Osize:96 Isize:224 Slave 4, configadr 1004, state 02 CoE Osize:96 Isize:224 ISIZE:224 224 OSIZE:96 SM programming SM2 Type:3 StartAddr:1100 Flags:00010064 SM3 Type:4 StartAddr:1200 Flags:00010020 ISIZE:224 224 OSIZE:96 SM programming SM2 Type:3 StartAddr:1100 Flags:00010064 SM3 Type:4 StartAddr:1200 Flags:00010020 ISIZE:224 224 OSIZE:96 SM programming SM2 Type:3 StartAddr:1100 Flags:00010064 SM3 Type:4 StartAddr:1200 Flags:00010020 ISIZE:224 224 OSIZE:96 SM programming SM2 Type:3 StartAddr:1100 Flags:00010064 SM3 Type:4 StartAddr:1200 Flags:00010020 OUTPUT MAPPING FMMU 0 SM2 slave 1 Outputs 0132CE20 startbit 0 OUTPUT MAPPING FMMU 0 SM2 slave 2 Outputs 0132CE2C startbit 0 OUTPUT MAPPING FMMU 0 SM2 slave 3 Outputs 0132CE38 startbit 0 OUTPUT MAPPING FMMU 0 SM2 slave 4 Outputs 0132CE44 startbit 0 =Slave 1, INPUT MAPPING FMMU 1 SM3 Inputs 0132CE50 startbit 0 =Slave 2, INPUT MAPPING FMMU 1 SM3 Inputs 0132CE6C startbit 0 =Slave 3, INPUT MAPPING FMMU 1 SM3 Inputs 0132CE88 startbit 0 =Slave 4, INPUT MAPPING FMMU 1 SM3 Inputs 0132CEA4 startbit 0 IOmapSize 160 Configure Sync0 Trigger time.............. Slave1 time = 18373946412576280 8000000 0 WKC value: 12, Expected 12 Launching RT Thread.............. Set SAFE OP State............ <>SafeOP Success<>.....Waiting .5sec....... Request OP State Mode = Operational Success ERROR: WorkCnt Missmatch wkc 4 ExpectedWkc 12 Slave 1 State=0x14 StatusCode=0x0034 : DC sync timeout error Slave 2 State=0x14 StatusCode=0x0034 : DC sync timeout error Slave 3 State=0x14 StatusCode=0x0034 : DC sync timeout error Slave 4 State=0x14 StatusCode=0x0034 : DC sync timeout error Shutting Down...... Restarting.............

Any further insight will be very helpful. Thx.

ArthurKetels commented 4 years ago

Most helpfull would be a wireshark capture of the packets being transmitted over the wire.. From your code sample I understand you use FreeRTOS, so wireshark does not run on your platform. But perhaps you could rewrite the test code for linux and try on a PC with linux?

There are simply too many variables to make an educated guess why this would not work. As I know OMRON they are very precise in keeping the EtherCAT standard. Most likely it is a simple mistake somewhere in your code.

BTW I see you used 8ms cycle time in your last test. Why?

ArthurKetels commented 4 years ago

After reading some more in the drive manual I found that statuscode 0x0034 is most likely caused by "Communications Synchronization Error" event code 8810 0000 hex. Can you check the event code?

It looks like your software and the drive are in disagreement about the PDO timing. Would it be possible to log the DCtime returned from each PDO transfer starting from the transition safe-OP->OP? Also include FirstSync0Time_ns.

ArthurKetels commented 4 years ago

Yet another comment. I also noted you did not write SDO object 0x1c32:01 "Synchronization Type" and 0x1c33:01. If the slave does not write them to value 2 then you need to do it. Best to check the value before requesting OP. Then you could also check 0x1c32:02 "cycle time" if it matches with what you set for sync0.

VitalSys commented 4 years ago

Thank you Arthur for your replies. I checked 1C32 and 1C33, all three sub indexes are reading zeros before going into OP mode. I write 2 to sync type, and I can then read it back as 2. But it does not make any difference. The 1C3x:02 “cycle time” register is read only and if I write on it I get the error for writing to a readonly object. I don’t know why it is read only, does not make much sense.

Attached is the wireshark capture. It was captured on windows platform using a managed switch capturing the traffic between the embedded controller and the drives. The time stamp is not correct in this log as it on windows.

At the end of the log, you can see some BWR and BRD. I think these are for setting the drives to operational mode.

wireshark_pdo_wkc_error.zip

VitalSys commented 4 years ago

The 8millisec cycle time I set to test the code using windows platform. on the embedded FreeRTOS based it is 1ms.

On embedded, the ec_DCtime shows 504us after sync 0 event (ec_DCtime - FirstSync0Time_ns) which is in the middle of the cycle. I also measured gap between successive ec_send_processdata and ec_dctime shows pretty close 1ms durations. see the watch variable picture.

ec_dctime_gaps

here is the drive pdf file. on page A-36 it says 1C32:02 is readonly.

https://assets.omron.eu/downloads/manual/en/i586_1s_series_users_manual_en.pdf

VitalSys commented 4 years ago

if you have teamViewer, may be you can log into my PC and see yourself whats going on. Thanks for all your help and time.

ArthurKetels commented 4 years ago

Hmm, timing sure does look all right. I double checked on the wireshark trace. Spot on in the middle of the 1ms cycle. Sync0 is also configured correctly.

What I did notice is that from the first PDO cycle error code object 603f is 0xff83. So the first thing come to mind is to reset the fault. Because object 6040 controlword is only active in OP you have to do this with SDO. My suggestion is at the same spot where you set object 1c32:01 you also reset the fault. It could be a simple case you are doing everything ok but a fault was latched and won't go away until you clear it.

VitalSys commented 4 years ago

ok I will try that. so to clear the fault, just write a zero to 6040 object by SDOWrite?

ArthurKetels commented 4 years ago

No, command and statusword in ds402 behave like a statemachine. You have to do it correctly. See for example http://www.simplerobot.net/2019/02/cia-402-motion-control-example-for.html

Although the code is not the best example, it shows the functionality. Also the omron manual gives more details about all the bit meanings and how they interact.

VitalSys commented 4 years ago

I tried the following change in ecx_dcsync0 and the magic happened. it is maintaining operational state for many hours of pdo excahnges on Omron 1S drives.

t = htoell( t & 0xFFFFFFFF ); (void)ecx_FPWR(context->port, slaveh, ECT_REG_DCSTART0, sizeof(t), &t, EC_TIMEOUTRET); / SYNC0 start time /

so looks like some drives only use the lower 32bits for DCtime.

Thanks for all your help and guidance.

ArthurKetels commented 4 years ago

Yeah, that would cause the observed problems. That is why I asked for the ESI file of the drive. It should state that DC is only 32 bits.

It is a bad solution for slave vendors to go with 32bit DC. When slaves are out for longer than 4.2s it is near to impossible to keep track of the correct DC.

Anyway, glad that you are up and running. And now we have a good record of how to get CSP mode working.

nakarlsson commented 4 years ago

we'll close cia 402 distributed clock