adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.96k stars 1.16k forks source link

Onewire bus getting interrupted on ESP32-S3 #8949

Open ilikecake opened 4 months ago

ilikecake commented 4 months ago

CircuitPython version

Adafruit CircuitPython 8.2.8 on 2023-11-16; Adafruit Feather ESP32S3 4MB Flash 2MB PSRAM with ESP32S3

Code/REPL

#In this function busdev is the adafruit_ds18x20.DS18X20 object for the sensor.

def ds18b20_getdata(busdev, busID):
    for i in range(5):
        try:
            temp = busdev.temperature
            return temp
        except RuntimeError as e:
            print(e)
            pass
    return 0    #If we get here, we had 5 consecutive read failures.

Behavior

Error messages 'CRC error' printed to REPL. My code runs that function once per minute, and I see at least one CRC error just about every time I run it.

Description

It appears that something -- I am assuming the RTOS, but I could be wrong -- appears to be interrupting the OWI writes.

As an example, the first part of the read temperature command to the DS18b20 device is supposed to look like this

<reset>
<presence pulse from device>
<Command 0x55 (match ROM)>
<Controller sends family code and serial number of device>
<Function command 0x44 (begin conversion)>

On the bus, it looks like this: image

However, I have seen occasionally that something will interrupt the OWI bus for about 200ms. In the below example, the family code being sent is interrupted. The trace view is pretty useless with this much of a delay, but the logic analyzer shows the issue: image The trace view for this is pretty uninformative, but you can see the long delay. The equivalent data packet to the 'good' trace above is between the two purple markers below: image Depending on when this delay happens, it has inconsistent effects on the bus.

Additional information

No response

tannewt commented 4 months ago

Are you running WiFi? I think all ports use our bitbanged OneWire implementation. Any idea how it would be done with ESP-IDF?

ilikecake commented 4 months ago

I am using wifi in this application. I see that you are disabling interrupts as the first line of that function. For the ESP controllers, I think that leads here. It has been a while since I have messed with freeRTOS, but I thought that portENTER_CRITICAL should prevent the scheduler from interrupting those lines, but I don't know where that is defined for the ESP boards. I also have no idea how this is implemented on dual core CPUs. Is it possible that the portENTER_CRITICAL function is not working as intended?

I have zero experience programming the ESP based CPUs. If I was implementing the 1-wire protocol on a controller I knew, I would probably use some combination of hardware timers to generate delays and interrupts to toggle pin states. However, based on what I know if freeRTOS, wouldn't the scheduler be able to interrupt those as well?

ilikecake commented 4 months ago

Hmm, I would have to stare at it a lot more, but there is a driver here that implements the 1-wire protocol using the RMT peripheral in the ESP32 CPU. If that module operates independent of the CPU once the data is loaded, that might be a solution?

tannewt commented 4 months ago

I also have no idea how this is implemented on dual core CPUs. Is it possible that the portENTER_CRITICAL function is not working as intended?

Running CP on the second core should insulate it from this.

My understanding is that onewire's bits are timing sensitive but not the inter-bit timing. Is that true? It is possible a Python garbage collect is causing a delay between bits.

Definitely try on the latest 9.x builds as well. A lot has changed under the hood for ESP.

ilikecake commented 4 months ago

Well, I switched to:

Adafruit CircuitPython 9.0.0-beta.2 on 2024-02-20; Adafruit Feather ESP32S3 4MB Flash 2MB PSRAM with ESP32S3

And it seems to have improved the situation. The bus is still getting interrupted sometimes, but the CRC errors are very uncommon.

image (interrupted on receiving data bytes.)

Every capture that I have done has shown a delay like that somewhere in the transaction (reading temperature from two DS18b20 sensors).

This does not seem like an ideal situation, but I am not sure how to fix it. I suppose you could disable interrupts for the duration of the _readbyte or _writebyte functions.

Scoping the bus has also revealed another potential problem that I will mention. It appears that the first bit in a byte sent can have inconsistent timing. For example, the datasheet defines a '1' bit as the controller pulling the line low and releasing it within 15uS. The bitbanging function appears to be trying to pull the line low for 6 us. In reality, I am measuring 9us most of the time, but the first bit of a written byte seems to take longer for some reason.

Here the controller is sending the 'match ROM' command 0b01010101 (bits are transmitted LSB first): image The first (LSB) bit is showing 18us low time. The logic analyzer seems to think that is close enough to consider it a valid command, but sometimes, the delay can get even longer.

image Low time for the first bit is now 24uS. This is far enough out of spec that the analyzer considers is a '0'.

It seems like the device is still getting the message, but it is hard to be sure.

tannewt commented 4 months ago

It appears that the first bit in a byte sent can have inconsistent timing.

This might correspond to loading code off of the flash chip because it isn't in RAM. You could try capturing the flash's CS line as well.

ilikecake commented 4 months ago

I am not sure I understand. Is there a separate flash chip that is loading the code before it executes it? Do I have access to the CS line to scope that? I thought the memory was inside the ESP32 module?

tannewt commented 3 months ago

The separate flash may be under the can on the module. A few chips even have them in the package.