MEN-Mikro-Elektronik / 13MD05-90

MDIS5 System Package for Linux (including drivers)
Other
4 stars 4 forks source link

mscan_pingpong does not work #228

Open dpfeuffer opened 3 years ago

dpfeuffer commented 3 years ago

We discovered the following problem with the MDIS MSCAN driver package https://github.com/MEN-Mikro-Elektronik/13Z015-06/tree/master/DRIVERS/MDIS_LL/MSCAN

Description from our Problem Report MAIN_PR006912:

There are multiple bugs in the mscan_pingpong example application:

Test hardware: F215 Invocation: mscan_pingpong can_7 can_8 system.dsc is attached system.dsc.txt

Please investigate this problem. If you need further support/input about this problem, I will give you the e-mail address from the colleague who discovered the problem.

dpfeuffer commented 2 years ago

The error does not occur with every call. We got about 4-5 errors for 100 calls of mscan_qtest (https://github.com/MEN-Mikro-Elektronik/13Z015-06/tree/master/DRIVERS/MDIS_LL/MSCAN/TOOLS/MSCAN_QSTEST/COM).

mscan_qstest_call

mscan_qstest_err

The problem is independent of the used tool (mscan_pingpong, mscan_qtest) so it seems to be a general mscan driver problem. I suggest to perform a duration/loop test.

GonzaloMartinR commented 2 years ago

I have been running batches of 25,000-50,000 tests and have been getting an average of 3 errors per 25,000 tests, never reaching the 7% error values quoted.

Command

After this I have made some minor modifications to the start of the mscan_qstest and mscan_pingpong tools to ensure that CAN communication is initialized every time it is launched. After this modification the number of errors seems to have been reduced to 1 or none per 50000 tests.

ClearCan

The summary of the test results is as follows:

results

dpfeuffer commented 2 years ago

In our test scenario, many tests are carried out in parallel.

I assume you are using the test setup #1 for the investigation, am I right?

Please perform the test with the F215 and G215. To increase the system load during the test, perform testing with F215 and G215 simultaneously and with additional tests running (e.g. UART tests). You could use the stress test as described in "Milestone 13MD05-90_02_04\13MD05-90_SwDeliveryAcceptanceReport.pdf" "ST_0300 Basic Stress Test" (available in instep) to increase the system load.

GonzaloMartinR commented 2 years ago

We believe we have found a solution to these, modifying the scheduling policy momentarily.

image

With these changes we are consistently getting 0 errors regardless of the number of tests we run. We have pushed an update to the mad-dev-10_14 branch of both the mscan_pingpong tool as well as mscan_qstest

dpfeuffer commented 2 years ago

It could be a workaround for the problem, but changing the scheduling policy to the realtime policy FIFO could lead to other problems:

https://unix.stackexchange.com/questions/48519/negatives-of-running-processes-with-real-time-priority: "The most immediate downside of running a realtime process is that the process can easily starve out every other process on the system. The result from your point of view will be that the computer is completely unresponsive to keyboard, mouse, and probably network, for as long as the realtime process is using the CPU. This can happen if something goes wrong and the process goes into an infinite loop, or even temporarily if the process starts a long-running calculation without waiting for input periodically."

Because the MDIS drivers and tools are common source (used for Linux and VxWorks) it's not possible to insert native Linux function calls into the sources (without using #ifdef sections, which we want to avoid).

Maybe it is possible to achieve the same behavior by using the linux chrt tool to set the SCHED_FIFO policy for mscan_pingpong. See https://man7.org/linux/man-pages/man1/chrt.1.html e.g. > chrt -f mscan_pingpong ...

Please test if you get the same results by calling mscan_pingpong with chrt and unmodified mscan_pingpong sources. My colleague tried this in his test setup just now, but he got the same error rate. He is using 6 can instances in his test setup. Please make an additional test with your modified mscan_pingpong with 2-3 simultaneous running mscan_pingpong calls (with 4-6 can instances).

dpfeuffer commented 2 years ago

Today we decided to postpone the MSCAN driver problem. The investigating requires deep knowledge of the driver and FPGA IP core functionality and is time consuming. We do not have the time to support you with this task at the moment. I lower down the priority and remove it from the 13MD05-90_02_05 milestone.