adafruit / Adafruit_CircuitPython_PM25

CircuitPython library for PM2.5 sensors
MIT License
28 stars 16 forks source link

RuntimeError: Invalid PM2.5 header #24

Open wiseac opened 2 years ago

wiseac commented 2 years ago

Following this guide https://learn.adafruit.com/pm25-air-quality-sensor/python-and-circuitpython with minor adjustments I keep running into this error... "Traceback (most recent call last): File "code.py", line 190, in File "/lib/adafruit_pm25/init.py", line 83, in read RuntimeError: Invalid PM2.5 header"

I am using a Teensy4.1 instead of the Feather to control the WiFi Module, Temp sensor, and PM25 Sensor. I am using the UART connection method. What will happen is it will run for a random amount of time, sometimes 5 hours, sometimes 30 minutes before running into this error. I'm unsure how to fix the problem without just looping a reset but that does not solve the problem.

Thanks, W

KeithTheEE commented 2 years ago

Have you tried wrapping the code in a function to protect it, allowing it to try again if it runs into an error up to a specified retry limit? UART has a tendency to occasionally just glitch out, and trying again a few times is a valid solution to add robustness to your code. It might solve your problem as I'm not sure if there's a better way around that kind of comms glitch.


def read_pm25(pm25_sensor):
    read_tries = 0
    read_attempt_limit = 5

    while read_tries < read_attempt_limit:
        try:
            particles = pm25_sensor.read()
            break
        except RuntimeError:
            print("RuntimeError while reading pm25, trying again. Attempt: ", read_tries)
            read_tries += 1
            time.sleep(0.1)
    if read_tries >= read_attempt_limit:
        # We tried too many times and it didn't work. Break the program to alert the user there's an error
        raise RuntimeError
    return particles

pm25_sensor = PM25_UART(uart, reset_pin)

This lets it try again after sleeping for a small period of time, hopefully long enough that comms work again. If it fails repeated it still raises an error so you don't miss larger issues, like a physical wire disconnecting.

9 indicates that there is a larger bug somewhere and #10 has foundations to implement it, however in the interim 'try again' might be a functional solution. Please let me know if this doesn't help though, I have this code running on my microcontroller that uses the UART plantower's sensor so I think it should work.

wiseac commented 2 years ago

Hello,

Thank you very much for this! I will definitely try it out, I did not figure it could be the actual connection method, Still new to CircuitPython and Arduinos in general.

I will update you if anything new comes up. Best, W

KeithTheEE commented 2 years ago

Wanted to check back in, does the code seem more stable with this protection in place?

wiseac commented 2 years ago

Hello, I appreciate you checking in. At this moment the code is still hanging, but it is not always the header problem. Sometimes its a connection issue. So Ive been adding more Try, Except checks in my code.

I also was interested in knowing if its better to put up the setup code stuff like pm25 = PM25_UART(uart, reset_pin) in the While loop with the try checks.

I am monitoring it on MU atm to see what problems arise. Usually my Checks will reload the program but something is causing the whole thing to break at random intervals.

KeithTheEE commented 2 years ago

What is the uptime you've been getting before an error? Additionally could you post a sample of your code, and a selection of the errors you receive with the full stacktrace for the error?

For the connection errors are you sure your wiring is solid? I'm trying to figure out if the issue is with this Adafruit_CircuitPython_PM25 library, and if so what changes can be done to mitigate the issue and where the issue is.

wiseac commented 2 years ago

Hello, I am monitoring my Serial monitor now looking for the most common error. And the uptime is all over the place. Could be hours, could be a few minutes. Here is my code.

import time
import board
import busio
from digitalio import DigitalInOut, Direction, Pull
from adafruit_esp32spi import adafruit_esp32spi, adafruit_esp32spi_wifimanager
from adafruit_io.adafruit_io import IO_HTTP
from simpleio import map_range
from adafruit_pm25.uart import PM25_UART
from adafruit_bme280 import basic as adafruit_bme280
import supervisor
import gc

gc.enable()
#microcontroller.on_next_reset(microcontroller.RunMode.NORMAL)

# Uncomment below for PMSA003I Air Quality Breakout
# from adafruit_pm25.i2c import PM25_I2C
# import adafruit_bme280

# Configure Sensor
# Return environmental sensor readings in degrees Celsius
USE_CELSIUS = True
# Interval the sensor publishes to Adafruit IO, in minutes
PUBLISH_INTERVAL = 10

### WiFi ###
# Get wifi details and more from a secrets.py file
try:
    from secrets import secrets
except ImportError:
    print("WiFi secrets are kept in secrets.py, please add them there!")
    raise

# AirLift FeatherWing
esp32_cs = DigitalInOut(board.D5)
esp32_ready = DigitalInOut(board.D9)
esp32_reset = DigitalInOut(board.D6)
esp32_gpio0 = DigitalInOut(board.D10)

spi = busio.SPI(board.SCK, board.MOSI, board.MISO)
esp = adafruit_esp32spi.ESP_SPIcontrol(
    spi, esp32_cs, esp32_ready, esp32_reset, esp32_gpio0
)

wifi = adafruit_esp32spi_wifimanager.ESPSPI_WiFiManager(esp, secrets, status_pixel=None, attempts=4)
# Connect to a PM2.5 sensor over UART
reset_pin = DigitalInOut(board.D16)
reset_pin.direction = Direction.OUTPUT
#reset_pin.value = False
uart = busio.UART(board.TX3, board.RX3, baudrate=9600)
pm25 = PM25_UART(uart, reset_pin)

# Create i2c object
i2c = busio.I2C(board.SCL, board.SDA, frequency=100000)

# Connect to a BME280 over I2C
bme280 = adafruit_bme280.Adafruit_BME280_I2C(i2c)
# Uncomment below for PMSA003I Air Quality Breakout
# pm25 = PM25_I2C(i2c, reset_pin)

# Uncomment below for BME680
# import adafruit_bme680
# bme_sensor = adafruit_bme680.Adafruit_BME680_I2C(i2c)

# Sensor Functions
def calculate_aqi(pm_sensor_reading):
    """Returns a calculated air quality index (AQI)
    and category as a tuple.
    NOTE: The AQI returned by this function should ideally be measured
    using the 24-hour concentration average. Calculating a AQI without
    averaging will result in higher AQI values than expected.
    :param float pm_sensor_reading: Particulate matter sensor value.

    """
    # Check sensor reading using EPA breakpoint (Clow-Chigh)
    try:
        if 0.0 <= pm_sensor_reading <= 12.0:
            # AQI calculation using EPA breakpoints (Ilow-IHigh)
            aqi_val = map_range(int(pm_sensor_reading), 0, 12, 0, 50)
            aqi_cat = "Good"
        elif 12.1 <= pm_sensor_reading <= 35.4:
            aqi_val = map_range(int(pm_sensor_reading), 12, 35, 51, 100)
            aqi_cat = "Moderate"
        elif 35.5 <= pm_sensor_reading <= 55.4:
            aqi_val = map_range(int(pm_sensor_reading), 36, 55, 101, 150)
            aqi_cat = "Unhealthy for Sensitive Groups"
        elif 55.5 <= pm_sensor_reading <= 150.4:
            aqi_val = map_range(int(pm_sensor_reading), 56, 150, 151, 200)
            aqi_cat = "Unhealthy"
        elif 150.5 <= pm_sensor_reading <= 250.4:
            aqi_val = map_range(int(pm_sensor_reading), 151, 250, 201, 300)
            aqi_cat = "Very Unhealthy"
        elif 250.5 <= pm_sensor_reading <= 350.4:
            aqi_val = map_range(int(pm_sensor_reading), 251, 350, 301, 400)
            aqi_cat = "Hazardous"
        elif 350.5 <= pm_sensor_reading <= 500.4:
            aqi_val = map_range(int(pm_sensor_reading), 351, 500, 401, 500)
            aqi_cat = "Hazardous"
        else:
            print("Invalid PM2.5 concentration")
            aqi_val = -1
            aqi_cat = None
        return aqi_val, aqi_cat
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
            print("Unable to read from sensor, retrying...")
            supervisor.reload()

def sample_aq_sensor():
    """Samples PM2.5 sensor
    over a 2.3 second sample rate.

    """
    try:
        aq_reading = 0
        aq_samples = []

        read_tries = 0
        read_attempt_limit = 5

        # initial timestamp
        time_start = time.monotonic()
        # sample pm2.5 sensor over 2.3 sec sample rate
        while (time.monotonic() - time_start) <= 2.3:
            try:
                aqdata = pm25.read()
                aq_samples.append(aqdata["pm25 env"])
                break
            except RuntimeError:
                print("RuntimeError while reading pm25, trying again. Attempt: ", read_tries)
                read_tries += 1
                time.sleep(0.1)
        if read_tries >= read_attempt_limit:
            raise RuntimeError
            # pm sensor output rate of 1s
            time.sleep(3)
        # average sample reading / # samples
        try:
            for sample in range(len(aq_samples)):
                aq_reading += aq_samples[sample]
            aq_reading = aq_reading / len(aq_samples)
            aq_samples = []
            return aq_reading
        except (ValueError, RuntimeError, ConnectionError, OSError) as e:
                print("Unable to read from sensor, retrying...")
                supervisor.reload()
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
            print("Unable to read from sensor, retrying...")
            supervisor.reload()

def read_bme(is_celsius=False):
    """Returns temperature and humidity
    from BME280/BME680 environmental sensor, as a tuple.

    :param bool is_celsius: Returns temperature in degrees celsius
                            if True, otherwise fahrenheit.
    """
    try:
        humid = bme280.humidity
        temp = bme280.temperature
        if not is_celsius:
            temp = temp * 1.8 + 32
        return temp, humid
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
        print("Failed to fetch time, retrying\n", e)
        supervisor.reload()

# Create an instance of the Adafruit IO HTTP client
io = IO_HTTP(secrets["aio_user"], secrets["aio_key"], wifi)

# Describes feeds used to hold Adafruit IO data
feed_aqi = io.get_feed("airquality-sensors.aqi")
feed_aqi_category = io.get_feed("airquality-sensors.category")
feed_humidity = io.get_feed("airquality-sensors.humidity")
feed_temperature = io.get_feed("airquality-sensors.temperature")

# Set up location metadata from secrets.py file
location_metadata = {
    "lat": secrets["latitude"],
    "lon": secrets["longitude"],
    "ele": secrets["elevation"],
}

elapsed_minutes = 0
prv_mins = 0

while True:
    try:
        print("Fetching time...")
        cur_time = io.receive_time()
        print("Time fetched OK!")
        # Hourly reset
        if cur_time.tm_min == 0:
            prv_mins = 0
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
        print("Failed to fetch time, retrying\n", e)
        supervisor.reload()

    try:
        if cur_time.tm_min >= prv_mins:
            print("%d min elapsed.." % elapsed_minutes)
            prv_mins = cur_time.tm_min
            elapsed_minutes += 1
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
        print("Failed to fetch time, retrying\n", e)
        supervisor.reload()
    try:
        if elapsed_minutes >= PUBLISH_INTERVAL:
            print("Sampling AQI...")
            aqi_reading = sample_aq_sensor()
            aqi, aqi_category = calculate_aqi(aqi_reading)
            # aqdata = pm25.read()
            # sampleaqi = aqdata["pm25 env"]
            # aqi, aqi_category = calculate_aqi(sampleaqi)
            print("AQI: %d" % aqi)
            print("Category: %s" % aqi_category)

            # temp and humidity
            print("Sampling environmental sensor...")
            temperature, humidity = read_bme(USE_CELSIUS)
            print("Temperature: %0.1f F" % temperature)
            print("Humidity: %0.1f %%" % humidity)

            # Publish all values to Adafruit IO
            print("Publishing to Adafruit IO...")
            io.send_data(feed_aqi["key"], str(aqi), location_metadata)
            io.send_data(feed_aqi_category["key"], aqi_category)
            io.send_data(feed_temperature["key"], str(temperature))
            io.send_data(feed_humidity["key"], str(humidity))
            print("Published!")
            elapsed_minutes = 0
    except (ValueError, RuntimeError, ConnectionError, OSError) as e:
        print("Failed to send data to IO, retrying\n", e)
        supervisor.reload()
        # Reset timer
    time.sleep(30)
wiseac commented 2 years ago

So ive noticed it mainly hangs up when getting to the "Sampling AQI" section. It will just not run the code afterwards or raise an error. Im pretty sure my wiring is solid but soon im going to actually solder all of this on a board. Is there a way to implement a timeout?

KeithTheEE commented 2 years ago

Before you solder things in place, I'd suggest removing the other peripheral sensors and trying just running this program stripped down for just the PM2.5 sensor and that sensor alone. It strikes me as odd that the aqi section is only occasionally hanging. it's easier to divide the problem up into smaller things to try this way.

Additionally the error could be a power issue, where with all of the sensors on the microcontroller draw just enough power that the comms are on the edge of working. If removing all of the other sensors and running this for a while doesn't solve the problem then we can be more confident that the issue has to do with the communication protocol

wiseac commented 2 years ago

Okay thank you for the suggestion. I will try that and you saying that reminds me that the Teensy4.1 runs at a high freq and draws a lot of power for it. I might also try reducing the clock rate on and seeing if that helps too!

KeithTheEE commented 2 years ago

How is it running with fewer sensors attached?

wiseac commented 2 years ago

So I just moved so I am just now setting up the sensor again. (Sorry about how long this is taking). Ive noticed that its the same for the most part. Still hangs on the "Sampling AQI" part which if i removed that function and just went with the basic read then it would be the header problem. I am going to rewire everything again just in case something got loose during the move.

20220817_112157

KeithTheEE commented 2 years ago

Still hangs on the "Sampling AQI" part which if i removed that function and just went with the basic read then it would be the header problem.

Is this with the other sensors removed? If this is related to a power issue, and the other sensors are connected then even if the code isn't calling it the power draw could still be an issue.

KeithTheEE commented 1 year ago

Just wanted to check again and see if better wiring and fewer peripherals drawing power helped increase the stability. I haven't been able to replicate this hanging but I'm using a metro esp32-s2

wiseac commented 1 year ago

Hello, I just now soldered a lot of the components onto a breakout board and it seems to be running well so far. It will always hangout within 24hrs so ill be sure to give an update tomorrow.

KeithTheEE commented 1 year ago

How has it been running? Is it stable or is the bug persistent?

wiseac commented 1 year ago

Still running into the same bug I believe. Sometimes it is hard to detect exactly what is going wrong. 1) When reading the serial monitor it will show me my errors until (And this happens most of the time when i leave it overnight) it will actually disconnect from my computer and stop running. 2) Im trying to see if my computers ports are causing an incorrect power supply problem so im using an external 5v/1amp power brick to power the Teensy which powers the rest of the electronics.

Its weird to me that it will run for lets say 12hrs then randomly just completely brick itself. If it was a power issue I would assume that it would stop earlier. It could also just be at a certain point all the components are drawing too much power and the Teensy bricks.

wiseac commented 1 year ago

Update. It has been running for 24hrs which is the longest it has ever gone. I created a boot.py file and added

import board
import supervisor
import microcontroller

microcontroller.cpu.frequency = 150000000
microcontroller.on_next_reset(microcontroller.RunMode.NORMAL)

supervisor.enable_autoreload()

What might be happening is whenever the MCU hard crashes/resets due to some type of failure (Still unsure if its due to the PM sensor) it is not booting back properly with the correct configurations. Basically I have still not figured out what the problem is but I am making it restart (Is my guess to what I did).

KeithTheEE commented 1 year ago

Hey I wanted to check back in on this and see if this has been stable, or if you've narrowed in on if the issue with with the PM2.5 sensor or not.

Hope the project has been going well and been collecting informative data this past month!

wiseac commented 1 year ago

So yes and no. It runs for longer than a day but still crashes. When it crashes i cant see any reports and it does not reset/reboot.

Do you know of a way to log crash reports/ collect logs?

Thanks for reaching out again, ive been quite busy with other projects and apologize.

KeithTheEE commented 1 year ago

Hmm.

Boy this is a bunch of open ended questions that this opens up haha, let's start working through them.

First, do you think the PM2.5 sensor is the issue? If not it might be best to close this and migrate the discussion someone more apt (and where folks more familiar with the 'more likely core issue' are). Next what version of circuit python are you running on the Teensy4.1?

I don't know about crash report/logs (that's an aspect of circuit python I need to get a better handle on) but you could do the inverse--make 'success' logs:

Are you writing to the sd card at any point? It might help to write to the SD card a lot:

With each of these writes, it'll be helpful to add some extra info, like gc.mem_free() so we can also track the memory in case there's a memory leak in your code.

Basically this will create a ton of data so you're very aware of where the program was when it broke, and over time you can see if it's consistently at that location when it broke or in the event that it breaks at inconsistent locations it might be an entirely different issue than we're zeroing in on.

I still have a suspicion what you're experiencing is a power supply issue and you're experiencing a brownout that resets your board (I had a similar issue on one of my boards-long uptime, random crash that I couldn't pinpoint) but this might help you identify if the error is consistently in the same location, and give you an idea of the state of the board before it crashed.

wiseac commented 1 year ago

Thank you for the suggestion. I will implement that and look into it. Im running the latest stable circuitpython build. If you are running a similar build what temp sensor are you using? I just looked into that the BME280 sensor might have its own brownouts that could be causing the problem. (Of course thats no the PM sensor, but i was just wondering since ive never gotten an error for the sensor because its the thing causing the problem.)

KeithTheEE commented 1 year ago

My setup has a few more peripherals on it. The base micro is an Adafruit Metro ESP32-S2, with a BME280, SGP40, and an SCD40 on the i2c bus, with the Serial PM2.5 sensor you have on the serial pin. It used to have an Adafruit MiCS-5524 Gas Sensor Breakout on an analog pin as well.

The code loop sampled every sensor at 1 Hz and if a data queue was full, it was post the sensor values in the queue to a home server waiting for the data.

It would inconsistently hit a random error and hard crash. I had uptime on the order of weeks to months prior to a crash, but I had a similar sensor node design that used a Adafruit ESP32-S2 Feather, the BME280, SGP40, and an i2c PM2.5 sensor setup outside that never crashed. The two microcontrollers were running the same control loop so I was sure the hard crash on the Metro wasn't related to the code.

Recently I redesigned the hardware housing for the Metro so I could mount the sensors on the wall (since I liked the data it was receiving) and chose to remove the Adafruit MiCS-5524 Gas Sensor. Since that hardware redesign I haven't experienced a crash and it's matching the uptime of the Feather board. I had been suspicious that the Adafruit MiCS-5524 Gas Sensor was putting me over my power budget for a while and I think now that I've removed it I was probably right.

Looking at your board setup I know the Teensy4.1 draws and uses a lot of power, I know anything that uses wifi uses a lot of power, and writing to an SD card can consume a lot of power. Depending on what's going on in the code it might be trying to read from the serial bus after it's just used a lot of power, and unable to send data at the correct voltages because of that. The intermittency of your issues, the number of power consuming parts you have connected to the board, combined with how hard it's been to pinpoint the source of the problem makes me think you're experiencing a power issue like I did. But that could be my own recency bias because it was the problem I recently solved.

Regardless of the source issue, over logging might help to start narrowing down some variables. Make sure you also include timestamps with it so we can get a sense of how much time has passed between each type of behavior, that'll be a huge help too.

KeithTheEE commented 1 year ago

How has added logging gone over the past couple of weeks? Has it helped focus on a region that's causing an issue, or is it inconsistent?

KeithTheEE commented 1 year ago

Just going to reach out once more to see if you've been able to log more information, or if you've removed peripherals for more stable power. The closest I ever got to 'reliably' reproducing the error (I say 'reliably' due to it's intermittences, and the part of the code which broke would change) was when I overloaded my board with too many peripherals and started occasionally browning it out. Since removing the Adafruit MiCS-5524 Gas Sensor from my setup I haven't had it crash in months.