OpenEtherCATsociety / SOEM

Simple Open Source EtherCAT Master
Other
1.36k stars 689 forks source link

Timing issue (despite PREEMPT RT patch) #330

Closed miscz closed 4 years ago

miscz commented 5 years ago

Hello!

I am facing a similar problem like robotixdeveloper in issue #296 . Running in cyclic sync position mode I am loosing packages and the servo is making cracking noises. My setting is alo very similar: It is a Lenovo T460s with an core i5 with an Intel Ethernet Connection I219-V. Following the instruction from #296 and others I did the following:

void set_realtime_priority() {

     int ret;

     // We'll operate on the currently running thread.
     pthread_t this_thread = pthread_self();

     // struct sched_param is used to store the scheduling priority
     struct sched_param params;

     // We'll set the priority to the maximum.
     params.sched_priority = sched_get_priority_max(SCHED_FIFO);

     std::cout << "Trying to set thread realtime prio = " << params.sched_priority << std::endl;

     // Attempt to set thread real-time priority to the SCHED_FIFO policy
     ret = pthread_setschedparam(this_thread, SCHED_FIFO, &params);
     if (ret != 0) {
         // Print the error
         std::cout << "Unsuccessful in setting thread realtime prio" << std::endl;
         return;
     }

     // Now verify the change in thread priority
     int policy = 0;
     ret = pthread_getschedparam(this_thread, &policy, &params);
     if (ret != 0) {
         std::cout << "Couldn't retrieve real-time scheduling paramers" << std::endl;
         return;
     }

     // Check the correct policy was applied
     if(policy != SCHED_FIFO) {
         std::cout << "Scheduling is NOT SCHED_FIFO!" << std::endl;
     } else {
         std::cout << "SCHED_FIFO OK" << std::endl;
     }

     // Print thread scheduling priority
     std::cout << "Thread priority is " << params.sched_priority << std::endl;
}
clock_gettime(CLOCK_MONOTONIC, &tspec_now);
cycletime[i] = (tspec_now.tv_sec*NSEC_PER_SEC + tspec_now.tv_nsec)-(tspec_b4.tv_sec*NSEC_PER_SEC + tspec_b4.tv_nsec);
ec_send_processdata();
clock_gettime(CLOCK_MONOTONIC, &tspec_b4);

The test program runs for 20 seconds with a cycle time of 2 miliseconds. I made some plots of the cycle time with its histogram cycletime cycletime_hist

of toff with its histogram toff toff_hist

and of ec_DCtime. DCtime

You can also find the wireshark capture of this test here: test3.pcapng.txt

Any suggestions what might be wrong with my code or the setup? Is it correct, that ec_DCtime has a modulo behavior? Limited range?

Thank you for your help!

miscz commented 5 years ago

By the way here is a cycle test for the rt kernel:

cyclictest_for_rt

miscz commented 5 years ago

Someone a suggestion?

I think one reason could be, that my kernel is not configured right. So I would try to get an older rt-kernel that is closer to my non-rt kernel version, to avoid major changes in the kernel config.

But one issue I still have is the modulo behavior of ec_DCtime. Is it correct like it is shown in the diagram? I expected to see a straight line like the identity function. It seems that a lot of the cracking noises occur when the ec_DCtime changes from the very high number 4.xE+09 to 0. Might this cause a problem in the ec_sync() function with its PI paramters which would result in skipping some packages because toff is getting too high?

ArthurKetels commented 5 years ago

Sorry for not answering earlier. OS timing issues are not exactly on topic for SOEM.

But one issue I still have is the modulo behavior of ec_DCtime.

This is an issue indeed. The sample code assumes a 64bit DCtime. Some slaves only provide 32 bits. It is up to you to extend this to 64 bits. Only then you can take the direct difference as in the example.

Extending or not. Some code change is necessary.

Next to that you do have some configuration issues. Linux kernel with preempt-rt patch should have a max latency jitter around 50us on modern hardware. My advice is to stick to a vanilla kernel and patch that with the latest preempt-rt patch set. Configure a minimal system with debugging off. CPU isolation for non critical processes might help too. Keep kernel processes at a higher priority than your SOEM application (SOEM needs the kernel to be responsive).

Pirority stack: 1 Hardware drivers 2 Kernel 3 SOEM + application 4 All others

miscz commented 5 years ago

Thank you for your answer! I will try to reconfigure the kernel.

Regarding the DC: Can you tell me which register the DC uses, so I can check if this is 32 or 64 bit long?

PaninP commented 5 years ago

May I ask you some question regarding how have you got the histogram data from ethernet port? I means to investigate my timing issue too. Thank you for your help.

tecodrive commented 5 years ago

I use Ubuntu Server 18.04 with low-latency kernel, important, no monitor or USB device (keyboard, mouse) connected, NIC Intel i210at or i211at, isolcpu kernel-Parms to isolate a physical core. If hyperthreading is active, you must isolate two logical CPUs (0 & 2) or (1 & 3) as a combination within two physical CPUs. Interrupt of NIC rx & tx should be assigned to the isolated CPU. RT thread runs on isolated CPU. All non-RT threads and interrupts should be run on the non-isolated CPU. My machine runs with 250us cycle time and latency max. <5 us. I first use a while loop to catch the exact next clock, to send and receive ecat pdo, then ~ 70% of the cycle time with clock_nanosleep.

I219 is very bad because of the very high latency on the receiving side. No chance to get performance.

Note the following: Do not use SDOwrite or SDOread in the same RT thread. Both send a lot of Ethercat frames and wait for the answer. Your RT thread is blocked. However, running SDOwrite or SDOread on a separate, non-RT capable thread also gives you high latencies. If non-RT threads have access to the NIC and set a mutex, the RT thread is blocked. The trick is a small patch in the SOEM that introduces a delay into the SDO commands. The RT thread therefore reads all incoming frames (sdo & pdo) from ec_receive_processdata.

Linux attempts to increase the priority of a mutex non-RT thread when an RT thread requests that mutex. However, this process is extremely indeterminate over time. I had tried it and there were these problems.

rt preempt is unnecessary. Everything important in the patch to the Intel processors is contained in the low latency kernel.

nakarlsson commented 4 years ago

Can we close this issue?

nakarlsson commented 4 years ago

Inactive