MrYsLab / PyMata

A Python client class library for Interaction with Standard Firmata
GNU Affero General Public License v3.0
95 stars 40 forks source link

Race condition issue from accessing "command_deque" #43

Closed kskwong closed 5 years ago

kskwong commented 5 years ago

Hi,

I'm running the PyMata on my device, and I seem to be running a race condition associated with the access of "command_deque", between the "serial" thread and the "command handler" thread. It seems the race condition happens when "serial" pushes a new byte onto the "command_deque" object while "command_handler" pops a new byte from the "command_deque". The symptom I'm observing is that sometimes I either get a corrupted command for the "command_dispatch", or something seems to freeze one of the threads (uncertain of the exact behavior).

From tweaking the PyMata*.py code, it seems this behavior can be fixed by adding a new RLock that is assigned to whenever "command_deque" is being accessed, either by "serial" or "command_handler". For example, this is what I added:

  1. pymata.py:

    deque_lock = threading.RLock()
  2. pymata_serial.py

    with self.pymata.deque_lock:
    self.command_deque.append(ord(c))
  3. pymata_command_handler.py

    with self.pymata.deque_lock:
    sysex_command = self.pymata.command_deque.popleft()

Regards

MrYsLab commented 5 years ago

Could you provide the code that is causing this for you.? I would like to try and recreate the error.

MrYsLab commented 5 years ago

Also, please tell me the operating system you are using and the Python version. Also the model of Arduino you are using and the sketch you have loaded. If you are getting an exception, please provide the exception trace you are seeing.

A Python deque is thread-safe so I am guessing that by adding the locks you are changing the timing and that could be masking some other issue. That s why I need to see the code and understand how to recreate the problem.

kskwong commented 5 years ago

Thanks for the response! I have stripped down most of the code in my project down to just using standard firmata and pymata, and used a fresh Teensy to test this code on, but I haven't been able to replicate this issue at this time. It's possible some other component, either software, firmware, or hardware, that could have been the source of the issue I was observing.

I'll keep playing around with the "stripped down" code, and as soon as I can replicate this issue, I'll send it over!

To answer your other questions:

  1. Python version 3.5.4, Linux (Fedora 25)
  2. Teensy 3.5 (not arduino)
  3. Exception Trace:
    Exception in thread Thread-4:
    Traceback (most recent call last):
    File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
    File "/usr/lib/python3.5/site-packages/PyMata/pymata_command_handler.py", line 845, in run
    method = dispatch_entry[0]
    TypeError: 'NoneType' object is not subscriptable

    From the exception trace above, it looks like it was not able to find the method from "dispatch_entry", which was supposedly defined before entering the "while-loop". This seems to hint at the fact that something might have been corrupted along the way, and that the host received an invalid sysex command. However, like I mentioned, I haven't been able to duplicate this issue with my stripped-down version of the code + isolated Teensy board.

Thank you for the comment on the deque being thread safe! That is another indication I'm likely doing something wrong. I'll get back to this as soon as I can!

MrYsLab commented 5 years ago

Thanks for the info and the exception trace. What you are seeing may have occurred due to a limitation of Arduino (and probably Teensy) not having a way to reset the device through software. This usually occurs when you start an analog input, exit the pymata application and restart it. The hardware continues to stream the input data, and pymata receives the data in mid-stream, so it is not a valid Firmata frame.

There are 2 ways to solve this. You may repower the Teensy after the program exits, or, before the program exits, disable reporting for all pins that were configured for digital and analog input.

kskwong commented 5 years ago

Ah I see! Thank you for these suggestions. I'll try looking into the suggestions as you have suggested, and see if it works for me! I'll close the issue for now. I'll comment here when I have further information.

P.S. I still haven't made the test code able to replicate the issue by this point, so definitely something else is going on.