MyTooliT / ICOc

ICOc is a tool to control the ICOtronic system, acquire data, and test stationary transceiver units and sensory tool holders.
https://mytoolit.github.io/ICOc/
2 stars 0 forks source link

Performance Problems When Receiving Streaming Data #40

Closed sanssecours closed 1 month ago

sanssecours commented 1 year ago

Description

It seems that the “new” Network class (based on python-can) might have some performance issues, if we use the PCAN interface (standard interface for PEAK CAN adapter on macOS and Windows) to stream data to the computer. After some amount of time the code throws a PcanCanOperationError declaring that the “receive queue was read too late”.

The same code works fine using the SocketCAN interface on Linux.

Steps to Reproduce

  1. Install ICOc in development mode

    pip install -e .[dev,test]
  2. Check out commit 4b260bce for test script new.py

  3. Execute the following command in the repository

    python mytoolit/experiments/new.py

Expected Result

The script finishes without any problems.

Actual Result

After a certain amount of time the script prints the an error message that looks like this:

Exception in thread can.notifier for bus "PCAN_USBBUS1":
Traceback (most recent call last):
  File "…threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "…threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "…site-packages/can/notifier.py", line 122, in _rx_thread
    msg = bus.recv(self.timeout)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "…site-packages/can/bus.py", line 98, in recv
    msg, already_filtered = self._recv_internal(timeout=time_left)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "…site-packages/can/interfaces/pcan/pcan.py", line 472, in _recv_internal
    raise PcanCanOperationError(self._get_formatted_error(result[0]))
can.interfaces.pcan.pcan.PcanCanOperationError: The receive queue was read too late

Additional Information

I tested the script three times on all of the (more or less) supported operating systems. One reason behind the failure (on macOS and Windows) might be the additional code (on top of python-can) in this repository. However replacing it with something like pass did not solve the problem (on macOS) when I tried. The CPU utilization stayed pretty much the same as far as I can tell.

Regarding the CPU usage: Please take the values below with a grain of salt. I “measured” the CPU utilization using the graphical system monitor software provided by the different operating systems. This process is obviously not very accurate, especially since modern systems contain multiple CPU cores. In the case of the M1 machine below, these cores even have very different performance characteristics.

Linux

System

Test Results

CPU Usage: ~ 80% CPU (of single core)

Test Result
Test 1 ✅ Works
Test 2 ✅ Works
Test 3 ✅ Works

macOS

System

CPU Usage: ~ 60% – 65% (of single core)

Test Result
Test 1 ❌ Failure after ca. 44 seconds
Test 2 ❌ Failure after ca. 2 minutes 45 seconds
Test 3 ❌ Failure after ca. 3 minutes 57 seconds

Windows

System

Test Results

CPU Usage: ~ 60% – 80% (of single core)

Test Result
Test 1 ❌ Failure after ca. 1 min 10 seconds
Test 2 ❌ Failure after ca. 1 min 10 seconds
Test 3 ❌ Failure after ca. 1 min 4 seconds
sanssecours commented 1 year ago

Update

The same problem also occurs with the SocketCAN interface on Linux. The only difference seems to be that the SocketCAN backend drops CAN packets without raising an exception. That is also the reason why I thought everything was working fine.

Lessons Learned

Always check the measurement data for lost packages.

sanssecours commented 8 months ago

Performance Measurement

Measuring performance on one of the efficiency cores of the M1 MacBook with:

taskpolicy -b fish -c 'icon measure -t 10 -n Test-STH'

showed about 30% performance usage without any code in AsyncStreamBuffer.on_message_received.

sanssecours commented 8 months ago

Ideas to Improve Performance

We are already working on ideas to improve the performance in the branch 🐌, which should also be the basis for version 2.0 of the package (see also issue #51). Currently we simplified the data structure returned by the streaming API to keep the performance problems in check.

Other ideas:

Note: It might make sense to count the number of instructions for a certain number of messages

sanssecours commented 1 month ago

In theory the latest changes (currently in the branch 🌱; planned for release 2.0) should impove the performance slightly. Another big advantage of the new code is that it raises an exception, if the buffer for streaming data exceeds a certain fixed size (currently 10 000 elements).

sanssecours commented 1 month ago

With the addition of a performance test to the extended manual tests I will close this issue for now.