OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.36k stars 688 forks source link

Syncronize multiple slaves with Sync0 #487

Closed maceta980 closed 3 years ago

maceta980 commented 3 years ago

Hello everyone!

I have tried to control individually different slaves and they work fine, but I had individually as well problems synchronizing the Sync0. Trying to go to OP state, I was checking the statusword of the servo and when the Sync0 was not well done (statusword says me as it is an alarm), I was closing the socket and restarting the SOEM automatically (with a simple goto statement) and finally after 1, 2 or 3 times restarting the SOEM, the servo synchronization was fine and the servo was starting to move. With just one servo, this solution was working "fine", however, now I am trying to control two servos and the problem is just increasing. When maybe the first servo synchronization is fine, the second servo synchronization is bad. Does someone know the reason of this? Why sometimes the Sync0 is well done and sometimes is not when the cofig is exactly the same?

I had set for all slaves that the PDO cycle is 8ms and I am using the ecatthread (as on the red_test) to synchronize this. Thanks.

ArthurKetels commented 3 years ago

As SOEM developers we get lots of questions relating to synchronization. Often the example programs or the self crafted code do not perform as well as they should. Resulting in some slaves that behave badly. Some times there are clear guidelines from the slave manufacturer how to achieve and check for proper synchronization. Some slaves are very forgiving and bad practices get unnoticed. But sometimes all bad things come together and the user is left wondering what did go wrong.

First of all I believe a slave manufacturer that supports synchronized slaves needs to specify clearly what the timing and synchronization constraints are. Also when the constraints are violated a clear and descriptive error should be generated. More and more slaves are emerging that do not bother to emit any error at all.

As a SOEM developer I can not do much to change the slave vendor behavior. But you as a user can (and should) complain. You are entitled to a properly implemented and tested device. And if complaining does not help, go elsewhere.

What I can do is to explain how to get the best possible synchronization and timing in your application.

Synchronization is about synchronizing different clocks. In EtherCAT every device has it's own clock, master and slaves. And each clock runs slightly different from other clocks. There are two methods to synchronize. One is to change the clocks speed to make them exactly equal. This is extremely difficult. The other solution (and the one used by EtherCAT) is to measure the deviation and calculate a correction. More precise to have one reference clock and distribute it to all others to observe. Then from that observation each other clock can then calculate its offset and its drift.

For EtherCAT slaves this method is supported by the ESC hardware. By reading the current clock value from the reference slave and writing it to the other slave clock it takes the difference and drift. These values are then used to feed a closed loop system that generates a new clock that is almost synchronized to the reference clock. Almost, because the observation is not perfect and the closed loop is digital and jitters a bit. It also takes a while before the slave clock is in sync with the reference. How much time will depend on the number of observations and the accuracy of the clocks. The accuracy of slave clocks is defined in the EtherCAT specification to be less then 20ppm. But even then it will take several thousand of observations for the clock to be settled in.

After the slave clock is synchronized to the reference clock it will take only a few observations per time unit to keep it very tight. As a rule of thumb 100 observations per second is more than enough. But to get a decent lock quickly it is recommended to send at least 10.000 clock distributions as quick as possible. This is what TwinCAT does. It will take less than a second.

The best way to make sure all slave clocks are locked in is to measure the internal closed loop parameters. Register System Time Difference (0x092C:0x092F) is the obvious one to test. When it's value drops below one microsecond and stays below it then you have reached reasonable synchronization. It is the task of the application programmer (and not SOEM) to monitor this for all slaves. Only when this constraint is satisfied try to start the sync0/1 generation.

There is one other important clock that has no hardware support for synchronization, the master clock. This is most often the clock of your operating system. And it is the clock you have the least control over. Often it is abused in all sorts of ways. For your OS/system try to find a clock that is least tampered with. The offset and drift are not very important, as long it is clean. Meaning the OS does not try to change it in an effort to synchronize it to something external.

Then deploy the same principle as the EtherCAT slaves. Observe the reference clock and measure the drift and offset of your OS clock. Use those two numbers to synchronize your real-time process data cycle. Monitor the absolute offset of your PDO cycle with respect to the reference clock. When it is within bounds (as set by the slave manufacturer, or otherwise <100us), and stays within bounds then your master timing constraint is satisfied.

Then you need to place the transmission of the master PDO cycle in the time slot of the slave (again by slave manufacturer data). For example when the sync0 is fired every 1000us the PDO transmission reaches the slave at 500us offset. The exact value is slave and configuration dependent. And when you have many slaves you have to account for propagation delay. The first slave will get its data earlier than the last slave. And all slaves need their constraints satisfied.

Again depending on the slave requirements the master is allowed a maximum number of successive deviations from this timing requirement. Some slaves have this at zero (very critical), others have them at dozens. So be sure your master is timing stable with little jitter.

By protocol definition the set-up and stabilization of clocks and sync0 can be done in safe-OP. Although some slaves want to have all constraints satisfied at the pre-OP to safe-OP transition. All constraints must be satisfied in OP state. The safe-OP state is often used by slaves to measure the timing quality of the master. Only when all PDO transmissions arrive in the designated time slots the slave will accept the synchronization. Good slaves will indicate this, or at least throw a clear and descriptive error when not. Lesser implementations will transition to OP and fail without any error. Keep in mind that some slaves will take up to 10 seconds or more to measure. Forcing safe-OP to OP before that will cause the slave to fail.

The complexity of the above is the main reason why I could not come up with a sample application that fits all unknown requirements and is still easy to understand. It is up to you to understand and implement accordingly. Just retry a piece of code in the hope it works is not a very robust solution. Thus the reason for your question is clear but the solution is in better understanding the principles. I hope my write-up helps a bit.

maceta980 commented 3 years ago

Thank you so much Arthur. Finally after reading your message and many old issues as well I realized that it's better to configure de DC clock before mapping the slaves (in Pre-OP). I was first mapping the slaves and then configuring the DC clock (I think that I took it from an example). So right now the slaves go to OP without no error at the first attempt!

However, (sometimes, not always) after some seconds, the slave number 2 has a network communication error, as I think that due to some delays, the slave 2 PDO cycle doesn't match. Something important to show it's the result of the slaveinfo program of both slaves:

Slave:1 Delay: 0[ns] Slave:2 Delay: 760[ns]

As it can be seen, the slave 2 has a 760ns of delay. Could it be this the problem? And how could I fix the problem? I am using the ecatthread to synchronize the slaves. Thanks!

I am attaching the wireshark just in case. WireShark.zip

ArthurKetels commented 3 years ago

Well no wonder your synchronization is not stable, you are breaching several of the rules I pointed out above. It is for you to find out which. Go by them step by step, and check if you have implemented them all.

Two slaves are not an issue for propagation delay. You have at least 50us to play with even with critical slave timing. 0.76us is not putting a dent in your timing budget.