Harsch-Systems / node-red-contrib-pi-plates

Control Pi-Plates from Node-RED
Apache License 2.0
6 stars 7 forks source link

DAQC2plate - ADC Nodes lockup if external python script to toggle DOUT pins is executed #22

Open KD4Z opened 2 years ago

KD4Z commented 2 years ago

This may not be an issue with node-red-contrib-pi-plates, however...

Attached Example Flow merely reads two ADC values. Import to Node-Red, configure the Pi-Plate, and Deploy. Validate ADC nodes are getting values.

Log into a shell on the Pi running Node-Red. Create wigglepin.py from attachment:

import piplates.DAQC2plate as DAQC2
import RPi.GPIO as GPIO
import time

try:
        while 1:
                DAQC2.setDOUTbit(0,5)
                time.sleep(.9)
                DAQC2.clrDOUTbit(0,5)
                time.sleep(.1)

except KeyboardInterrupt:
        DAQC2.clrDOUTbit(0,5)
        GPIO.cleanup()

Execute it python3 wigglepin.py

The DOUT 5 pin should start toggling, or sometimes you get the callstack listed below. Notice the ADC values stop and the Pi-Plate is no longer accessible from Node-Red. Restart the Flow. Nope. Still dead.

Restart Pi to reconnect to the Pi-Plate.

Wash-Rinse-Repeat

Error Callstack

Traceback (most recent call last):
  File "wigglepin.py", line 1, in <module>
    import piplates.DAQC2plate as DAQC2
  File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 748, in <module>
    quietPoll()
  File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 713, in quietPoll
    getCalVals(i)
  File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 727, in getCalVals
    values[j]=CalGetByte(addr,6*i+j)
  File "/usr/local/lib/python3.7/dist-packages/piplates/DAQC2plate.py", line 586, in CalGetByte
    return resp[0]
IndexError: list index out of range

ADC_Demo_flow.json.txt stepsToRepro.txt wigglepin.py.txt

mharsch commented 2 years ago

There should really only be one process talking to the Pi Plates at a time. node-pi-plates (which is used by node-red-contrib-pi-plates under the hood) spawns a python co-process (plate_io.py) that imports the python pi-plates module and makes calls to the pi-plates api.

In order to allow multiple processes to share the pi plates, we'd need a different architecture where the process talking to the pi-plates exposed some API where multiple consumers could connect and make requests (e.g. pigpiod)

mharsch commented 2 years ago

So, what's happening when the plate(s) appear to stop responding from the Node-RED interface is a crash of the underlying python process that talks to the plates on behalf of the Node-RED pi-plates nodes. This is probably triggered by a reset of the pi plates microcontroller which means a period of time where the python API calls fail. The wiggle script can also fail in a similar way, but since it's making calls less frequently, it usually survives longer than the node-pi-plates plate_io.py process. The plates are actually back to being functional within a second or two, but the node-pi-plates python co-process won't be re-spawned until Node-RED itself is restarted (e.g. systemctl restart nodered).

So, first we should be providing more helpful error messages when our python co-process crashes: Harsch-Systems/node-pi-plates#10

Secondly, we should really survive such crashes by re-spawning the co-process automatically (or, better yet, offer a configuration option to respawn upon python process crash). Harsch-Systems/node-pi-plates#11

We should probably keep a counter of how many times we've restarted and mention that in the error message each time, so the user can easily detect these kind of 'dueling processes' failure scenarios.

pi-plates commented 2 years ago

The Pi-Plates microcontroller does not reset unless explicitly told to do so. What can happen (in our later products) is that the processor will "give up" on a data exchange with the RPi if there is no response within 50msec and reset the I/O process. This may lead to erroneous data being received by the RPi and/or a loss of synchronization. Our older products (DAQC, MOTOR, and RELAY) use a simpler protocol and require lots of undesireable delays to maintain synchronization.