Deniz-Eren / dev-can-linux

Porting of Linux CAN-bus drivers to QNX
GNU General Public License v2.0
4 stars 1 forks source link

Limits to the numbers of open/close (65535) by the driver ? #56

Closed phicore closed 1 month ago

phicore commented 2 months ago

Describe the bug

We use the 1.2.0 version of the driver with a PEAK-MiniPCIe CAN interface with QNX7.1 running on an Arm64 based IMX8QM SOM.

We have a stress test were we send from outside CAN messages every 10ms and the program (called can-loop) shall simply send them back after altering the MID. This program runs also on the 3 internal CAN of the iMX8QM. So we have 4 simultaneous CAN interface running, and one of them is using the dev-can-linux driver.

We observed that after 65535 messages the dev-can-linux 1.2.0 drivers stops answering. It is not related to the client program as we have to kill the driver and relaunch it to be able to get functionality back. Remark: The can-loop program closes and reopens the device (mailboxes) attached to the dev-can-linux driver (/dev/can3) between each message, because we add issues letting it open constantly.

To Reproduce

we launch the driver the following way:

'dev-can-linux -q -U3 -e 1c:08,0x05 -b id=3,freq=125k,btr0=0x07,btr1=0x14 &'

Then we launch our can-loop program.

Platform

Driver

More clear case Untitled3

Deniz-Eren commented 2 months ago

@phicore if you remove the open/close for every message sent, after applying the temporary fix discussed in https://github.com/Deniz-Eren/dev-can-linux/issues/57 does this 65535 limit still happen?

I suspect the issue is with open/close. I will replicate this and then diagnose and fix.

phicore commented 2 months ago

With the echo suppression, we do not need anymore to open and close the mailboxes for every message. So I suppose we can close this issue. Again thanks a lot.

Deniz-Eren commented 2 months ago

It is still an issue and a good find; I will investigate further and fix.

Deniz-Eren commented 1 month ago

No specific or hardcoded limits of 65536 or 2^16 exist in the code base.

Initial suspicion was that open/close results in a client session being created and destroyed. Client sessions have with them an RX thread per connection client (via open) to deliver the received messages. The QNX OS limit of maximum number of threads per process is known to be 32767, so we investigated whether the threads were not being cleaned up properly. This wasn't the case however, and all the threads created are detached and cleanly shutdown.

The further test this theory an experiment was done. Driver dev-can-linux was started and while monitoring the number of threads active using command "pidin -p pid" messages were sent using command "cansend -u0,tx0 -m0x1234,1,0xABCD". No increase in the number of threads noted.

Next to replicate the example code given in https://github.com/Deniz-Eren/dev-can-linux/issues/57, i.e. "void fctThreadCan3( void arg )", we wrote a new unit test SingleSendReceiveAfterManyOpenClose in tests/driver/io/driver-io-tests.cpp to perform the same test as test SingleSendReceive but only after opening and closing the file descriptors 100,000 times.

Exact problem described was successfully replicated by test SingleSendReceiveAfterManyOpenClose. After running the test the command "candump -u0,rx0" no longer recieves messages from command "cansend -u0,tx0 -m0x1234,1,0xABCD".

The error message "rx_loop exit: Unable to attach to channel." was noted during the test produced from src/resmgr.c#L656.

The problem was identified to be in rx_loop whereby the message_connect() was never being disconnected. After adding the correct disconnection the problem is shown to be fixed.

Further improvements added by making the rx_loop thread non-detached so that on close we can pthread_join() to clean-up smoother.