Ribbit-Network / ribbit-network-frog-hardware

The sensor for the world's largest crowdsourced network of open-source, low-cost, GHG Gas Detection Sensors.
https://www.ribbitnetwork.org/
MIT License
96 stars 26 forks source link

SCD30 Stops Communicating Sometimes #61

Closed keenanjohnson closed 2 years ago

keenanjohnson commented 3 years ago

As I've deployed more Frog sensors, I've noticed that occasionally some of the SCD30 sensors will stop responding. The condition is resolved after a power cycle, so I don't believe that it's anything physical (loose connector, etc), but I'm open to be proven wrong.

I haven't been able to correlate this to any particular event.

Theories for cause:

Example Error Log

Service exited 'co2 sha256:d3e002ec75c5d21446f919b309b83ef25eac24c214e0c025227e0ef4386e8e2e'
Restarting service 'co2 sha256:d3e002ec75c5d21446f919b309b83ef25eac24c214e0c025227e0ef4386e8e2e'
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 154, in __probe_for_device
 co2      self.i2c.writeto(self.device_address, b"")
 co2    File "/usr/local/lib/python3.9/site-packages/busio.py", line 159, in writeto
 co2      return self._i2c.writeto(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 49, in writeto
 co2      self._i2c_bus.write_bytes(address, buffer[start:end])
 co2    File "/usr/local/lib/python3.9/site-packages/Adafruit_PureIO/smbus.py", line 314, in write_bytes
 co2      self._device.write(buf)
 co2  TimeoutError: [Errno 110] Connection timed out
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 160, in __probe_for_device
 co2      self.i2c.readfrom_into(self.device_address, result)
 co2    File "/usr/local/lib/python3.9/site-packages/busio.py", line 149, in readfrom_into
 co2      return self._i2c.readfrom_into(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 56, in readfrom_into
 co2      readin = self._i2c_bus.read_bytes(address, end - start)
 co2    File "/usr/local/lib/python3.9/site-packages/Adafruit_PureIO/smbus.py", line 181, in read_bytes
 co2      return self._device.read(number)
 co2  TimeoutError: [Errno 110] Connection timed out
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/src/co2.py", line 55, in <module>
 co2      scd = adafruit_scd30.SCD30(i2c_bus)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_scd30.py", line 93, in __init__
 co2      self.i2c_device = i2c_device.I2CDevice(i2c_bus, address)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 50, in __init__
 co2      self.__probe_for_device()
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 163, in __probe_for_device
 co2      raise ValueError("No I2C device at address: 0x%x" % self.device_address)
 co2  ValueError: No I2C device at address: 0x61
keenanjohnson commented 3 years ago

It's seems possible that slowing down the I2C bus may also help!

https://learn.adafruit.com/circuitpython-on-raspberrypi-linux/i2c-clock-stretching

keenanjohnson commented 3 years ago

Seems like this can be set via

# Clock stretching by slowing down to 10KHz
dtparam=i2c_arm_baudrate=10000
keenanjohnson commented 3 years ago

@mschwanzer this appears to be what's happening with your sensor. I'm going to try slowing the clock speed on I2C to see if this stops it.

keenanjohnson commented 3 years ago

I changed the clock speed on a few test devices and verified by running the command:

cat /sys/kernel/debug/clk/clk_summary

I'll watch those few to see if I can get the condition to repeat.

keenanjohnson commented 2 years ago

Unfortunately after changing this setting down to 10000, I've still seen this sensor disconnection issue reproduce.

keenanjohnson commented 2 years ago

@eaudiffred it looks like my software fix here did not resolve the issue with your sensor. I'm going to try a second software fix related to the clock stretching of the communication bus that seems more promising!

More information : https://github.com/RequestForCoffee/rpi-i2c-timings

eaudiffred commented 2 years ago

Great, thanks! Let me know when to reboot.

keenanjohnson commented 2 years ago

Will do @eaudiffred!

Note for myself, the CM4 uses bcm2835 (i2c@7e804000)

keenanjohnson commented 2 years ago

Ok I tried for a while to use the rpi-i2c-timing utility above, but couldn't get it to function (see https://github.com/RequestForCoffee/rpi-i2c-timings/issues/1).

I found this related issue which suggested using parameter force_turbo=1 in the config.txt, so I'm going to try that.

keenanjohnson commented 2 years ago

@eaudiffred if you want to give your sensor another reboot, we can see if that helps things :)

djgood commented 2 years ago

I’m also seeing this occasionally on my sensor (and right now actually). We should see what’s happening on the bus lines w/ a scope. I can do that tomorrow. Because I’m wondering if we’re running into a stuck/blocked I2C bus. A quick Google search shows that the i2c driver for the raspberry pi doesn’t implement any kind of i2c recovery routines, which is unfortunate because we’d have to implement our own. Sounds like fun, though :)

keenanjohnson commented 2 years ago

Yes! It would be super helpful if you have a scope or a logic analyzer to figure out exactly what is happening. I think that something like that might be the problem as the scd30 data sheet mentions that clock stretching for i2c needs to be supported. I can imagine that lack of clock stretching could lead to the stuck bus.

On Thu, Nov 11, 2021 at 9:32 PM Desmond Good @.***> wrote:

I’m also seeing this occasionally on my sensor (and right now actually). We should see what’s happening on the bus lines w/ a scope. I can do that tomorrow. Because I’m wondering if we’re running into a stuck/blocked I2C bus https://www.pebblebay.com/i2c-lock-up-prevention-and-recovery/. A quick Google search shows that the i2c driver for the raspberry pi doesn’t implement any kind of i2c recovery routines, which is unfortunate because we’d have to implement our own. Sounds like a fun, though :)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Ribbit-Network/ribbit-network-frog-sensor/issues/61#issuecomment-966832929, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQ3FR3QPIGNSLEPA7KVGTULSRHTANCNFSM5G26ZKNQ .

eaudiffred commented 2 years ago

Restarted this morning and moved it back outside. Just checked and unfortunately I don't see it online. I'll give it another restart when I get home from work. In the past it's taken 2 or 3 power cycles before I see the dot on the ribbit network map.

djgood commented 2 years ago

Scope captures

Looking at the bus while co2 service was reporting ValueError: No I2C device at address: 0x61.

Screen Shot 2021-11-12 at 5 53 28 PM

No clock cycles so it seems like it's not even trying to clock in data. Kind of a misleading error.

I disconnected the DPS310 to see if that would resolve anything, but nothing changed.

After power cycling SCD-30 by disconnecting/reconnecting the qwiic connect cable:

image

Bus looks happy and co2 service is functioning normally. I'd put my money on it being an issue where the clock stretching on the SCD-30 isn't supported. Do you know if the Adafruit library uses hardware or software i2c? The i2c peripheral on the bcm2835 apparently has a buggy implementation of clock stretching, but some software libraries support it better.

Less critical but ideally Ribbit would have a way to recover from a stuck bus, since it's bound to crop up somehow. A potential robustness improvement, maybe.

keenanjohnson commented 2 years ago

Thanks for those scope traces @djgood! So since you disconnected the SCC30, does that make it most likely that the stuck condition is within the BCM chip on the Raspberry Pi?

I believe it uses the hardware i2c.

I had theorized that adding a power switch to switch power to the SCD30 on and off in case of a stuck bus would resolve the issue. It seems like your testing confirms this correct? Perhaps there is a better way in software, but as you mentioned maybe the buggy bcm stuff prevents that.

Sparkfun used to make a nice QWIIC power module but it's out of production it seems. Shouldn't be too hard for us to reproduce if we had to.

djgood commented 2 years ago

No problem! Hm, I was thinking that it was the SCD-30 that was stuck in that clock stretching condition, which blocks the bus and the Raspberry Pi doesn't know how to detect it. So once the SCD-30 releases the bus when it's powered off the Raspberry Pi can continue driving the bus as normal.

I think if we could power cycle the SCD30 (or even reset) that would fix the issue but it seems like the easiest alternative to me is to use the software I2C bus, more info here: https://github.com/fivdi/i2c-bus/blob/master/doc/raspberry-pi-software-i2c.md

This is a good discussion on I2C clock stretching on the Raspberry Pis: https://raspberrypi.stackexchange.com/questions/127271/does-the-raspberry-pi-i²c-bus-support-clock-stretching

keenanjohnson commented 2 years ago

Yeah makes sense. Seems like if the software i2c is the way to go, let's try it! I believe the scd30 library can be configured to use the new pins.

There isn't a way to use the existing hardware i2c pins as software i2c pins right? This would be awesome because we can update all the frogs in the field without a hardware change, but might not be possible.

On Fri, Nov 12, 2021 at 4:44 PM Desmond Good @.***> wrote:

No problem! Hm, I was thinking that it was the SCD-30 that was stuck in that clock stretching condition, which blocks the bus and the Raspberry Pi doesn't know how to detect it. So once the SCD-30 releases the bus when it's powered off the Raspberry Pi can continue driving the bus as normal.

I think if we could power cycle the SCD30 (or even reset) that would fix the issue but it seems like the easiest alternative to me is to use the software I2C bus, more info here: https://github.com/fivdi/i2c-bus/blob/master/doc/raspberry-pi-software-i2c.md

This is a good discussion on I2C clock stretching on the Raspberry Pis: https://raspberrypi.stackexchange.com/questions/127271/does-the-raspberry-pi-i²c-bus-support-clock-stretching

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Ribbit-Network/ribbit-network-frog-sensor/issues/61#issuecomment-967746437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATQ3FWLZJBEYA5U5Y3B2RDULWYFPANCNFSM5G26ZKNQ .

keenanjohnson commented 2 years ago

Based on this forum post, it seems like it might be possible to use the existing hardware i2c as GPIOS using the software i2c. That would be rad.

djgood commented 2 years ago

Yeah, looks like that's totally doable! Awesome!

keenanjohnson commented 2 years ago

I tried testing the software i2c by adding the following, but it doesn't seem to be working.

dtparam=i2c_arm=off
dtparam=i2c=off
dtoverlay=i2c-gpio,i2c_gpio_sda=2,i2c_gpio_scl=3

I created a balena forum post here to see if anyone else has any additional tips.

djgood commented 2 years ago

Have you tried specifying different pins? Wondering if the problem is with disabling the hardware i2c or enabling the software i2c. I’ll try getting something working on my device

keenanjohnson commented 2 years ago

Yes I tried the same configuration with different pins, but same thing

dtparam=i2c_arm=off dtparam=i2c=off dtoverlay=i2c-gpio,i2c_gpio_sda=23,i2c_gpio_scl=24

keenanjohnson commented 2 years ago

Per this forum post discussing the software i2c implementation, it seems like I should be using the settings below instead of my settings above. Will test shortly.

dtparam=i2c_arm=off
dtparam=i2c-gpio=on
dtoverlay=i2c-gpio,i2c_gpio_sda=2,i2c_gpio_scl=3
keenanjohnson commented 2 years ago

Enabling Software I2C

All right! I was able to successfully enable the i2c-gpio interface on the same pins (2 and 3) as the hardware i2c on the raspberry pi via the following configuration

Define DT parameters = "i2c_arm=off","i2c-gpio=on"
Define DT overlays = "dwc2,dr_mode=host","i2c-gpio,i2c_gpio_delay_us=20,i2c_gpio_sda=2,i2c_gpio_scl=3"

This allowed to me to see the i2c and verify everything as shown below:

root@5582468:/usr/src# dmesg | grep i2c
[   10.451224] i2c-gpio ffffffff00000002.i2c: using lines 2 (SDA) and 3 (SCL)
[   15.229958] i2c /dev entries driver
root@5582468:/usr/src# i2cdetect -l
i2c-11  i2c             ffffffff00000002.i2c                    I2C adapter
root@5582468:/usr/src# i2cdetect -y 11
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- 61 -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- 77

Testing to see if it fixes the issue with I2C bus lock-up

I then had to make a few changes to the source code to allow the SCD30 Python code to use the I2C bus.

I first install the https://github.com/adafruit/Adafruit_Python_Extended_Bus library in order to easily create the I2C object required by the SCD30 sensor constructor. I modified the co2.py file to initialize the I2C bus as shown below:

from adafruit_extended_bus import ExtendedI2C as I2C

#specificy bus number 11 (the software I2C bus)
i2c_bus = I2C(11)
scd = adafruit_scd30.SCD30(i2c_bus)

I started up the python script and I was able to connect and read data from the SCD30 and barometer just like before!

Unfortunately, this did not resolve the issue and the bus locked up in the same fashion as before after a few hours.

co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 154, in __probe_for_device
 co2      self.i2c.writeto(self.device_address, b"")
 co2    File "/usr/local/lib/python3.10/site-packages/busio.py", line 166, in writeto
 co2      return self._i2c.writeto(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 49, in writeto
 co2      self._i2c_bus.write_bytes(address, buffer[start:end])
 co2    File "/usr/local/lib/python3.10/site-packages/Adafruit_PureIO/smbus.py", line 314, in write_bytes
 co2      self._device.write(buf)
 co2  OSError: [Errno 6] No such device or address
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 160, in __probe_for_device
 co2      self.i2c.readfrom_into(self.device_address, result)
 co2    File "/usr/local/lib/python3.10/site-packages/busio.py", line 156, in readfrom_into
 co2      return self._i2c.readfrom_into(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 56, in readfrom_into
 co2      readin = self._i2c_bus.read_bytes(address, end - start)
 co2    File "/usr/local/lib/python3.10/site-packages/Adafruit_PureIO/smbus.py", line 181, in read_bytes
 co2      return self._device.read(number)
 co2  OSError: [Errno 6] No such device or address
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/src/co2.py", line 57, in <module>
 co2      scd = adafruit_scd30.SCD30(i2c_bus)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_scd30.py", line 93, in __init__
 co2      self.i2c_device = i2c_device.I2CDevice(i2c_bus, address)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 50, in __init__
 co2      self.__probe_for_device()
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 163, in __probe_for_device
 co2      raise ValueError("No I2C device at address: 0x%x" % self.device_address)
 co2  ValueError: No I2C device at address: 0x61

I had to reboot the power on the full system to recover.

Next Steps

I'm not exactly sure what to try next. I'm going to try slowing down the I2C bus a bit more via the i2c_gpio_delay_us=20 parameter.

keenanjohnson commented 2 years ago

Setting i2c_gpio_delay_us=100 which should correspond to 10kHz I2C speed

djgood commented 2 years ago

Argggg! That's frustrating that it didn't solve the problem.

If slowing down the bus doesn't work, another thing we can try as a last resort is to try to bit bang those GPIOs to generate a bunch clock cycles if we detect that the bus is locked up. That would hopefully to get the SCD30 to free it. Not sure how difficult that would be with our current code, maybe we can interact with the software i2c somehow? Also, that's assuming that the SCD30 isn't just completely wedged and would actually respond.

keenanjohnson commented 2 years ago

I have great news! After running the test raspberry pi for over 48 hours, the i2c connection seems to be going strong with the software gpio at 10kHz, so I think I'm feeling comfortable to call this done and start rolling it out to the wider fleet.

I'm going to clean up the code a bit a do the rollout next.

keenanjohnson commented 2 years ago

Deployed to the Ribbit Network fleet:

image

Software release: c35bf3ccb713