Deniz-Eren / dev-can-linux

Porting of Linux CAN-bus drivers to QNX
GNU General Public License v2.0
4 stars 1 forks source link

Fix for bug https://github.com/Deniz-Eren/dev-can-linux/issues/56 #60

Closed Deniz-Eren closed 1 month ago

Deniz-Eren commented 1 month ago

No specific or hardcoded limits of 65536 or 2^16 exist in the code base.

Initial suspicion was that open/close results in a client session being created and destroyed. Client sessions have with them an RX thread per connection client (via open) to deliver the received messages. The QNX OS limit of maximum number of threads per process is known to be 32767, so we investigated whether the threads were not being cleaned up properly. This wasn't the case however, and all the threads created are detached and cleanly shutdown.

To further test this theory an experiment was done. Driver dev-can-linux was started and while monitoring the number of threads active using command "pidin -p pid" messages were sent using command "cansend -u0,tx0 -m0x1234,1,0xABCD". No increase in the number of threads noted.

Next to replicate the example code given in https://github.com/Deniz-Eren/dev-can-linux/issues/57, i.e. "void fctThreadCan3( void arg )", we wrote a new unit test SingleSendReceiveAfterManyOpenClose in tests/driver/io/driver-io-tests.cpp to perform the same test as test SingleSendReceive but only after opening and closing the file descriptors 100,000 times.

Exact problem described in https://github.com/Deniz-Eren/dev-can-linux/issues/56 was successfully replicated by test SingleSendReceiveAfterManyOpenClose. After running the test the command "candump -u0,rx0" no longer recieves messages from command "cansend -u0,tx0 -m0x1234,1,0xABCD".

The error message "rx_loop exit: Unable to attach to channel." was noted during the test produced from https://github.com/Deniz-Eren/dev-can-linux/blob/1bb3cfede13e847c98a309da3b5fbc415c4694ce/src/resmgr.c#L656

The problem was identified to be in rx_loop whereby the message_connect() was never being disconnected. After adding the correct disconnection the problem is shown to be fixed.

Further improvements added by making the rx_loop thread non-detached so that on close we can pthread_join() to clean-up smoother.