`ulp_riscv_i2c_master_write_to_device` fails on first write (IDFGH-13096)

sidwarkd commented 4 months ago

Answers checklist.

[X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
[X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
[X] I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.1.1

Espressif SoC revision.

ESP32-S3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board

Power Supply used.

USB

What is the expected behavior?

ulp_riscv_i2c_master_write_to_device should properly write to the I2C device on the first write attempt.

What is the actual behavior?

Most of the time the first call to ulp_riscv_i2c_master_write_to_device fails with all subsequent calls working just fine. The following is produced in the output.

E (345) ulp_riscv_i2c: Write Failed!
E (345) ulp_riscv_i2c: RTC I2C Interrupt Raw Reg 0x12c
E (345) ulp_riscv_i2c: RTC I2C Status Reg 0x66860000

Steps to reproduce.

I have created a Github repository that is a minimal sample that allows you to recreate the issue. In this case all you need is an ESP32-S3 development board with an ENS210 sensor connected to the I2C bus. Then simply build and run the project found at https://github.com/sidwarkd/rtc_i2c_issue.

This issue was first reported in #11037

Debug Logs.

The relevant debug logs can be found in [the repro repo](https://github.com/sidwarkd/rtc_i2c_issue) but here is the relevant section:

I (315) main_task: Calling app_main()
I (315) main: Initializing RTC I2C ... 
I (325) main: ok
I (325) main: Initializing ENS210 sensor ... 
I (335) main: Writing to SYS_CTRL register
E (345) ulp_riscv_i2c: Write Failed!
E (345) ulp_riscv_i2c: RTC I2C Interrupt Raw Reg 0x12c
E (345) ulp_riscv_i2c: RTC I2C Status Reg 0x66860000
I (405) main: Writing to SYS_CTRL register
I (465) main: Writing to SYS_CTRL register
I (525) main: Writing to SENS_RUN register
I (585) main: Writing to SYS_CTRL register to enable low power mode
I (645) main: Writing to SENS_START register
I (715) main: Status: 0x01

I (735) main: ENS210 Device ID: 0x0210

I (735) main: ok
I (935) main: Reading temperature and humidity data from ENS210 sensor ...
I (995) main: Temperature = 24.178 deg Celsius
I (995) main: Humidity = 25.898%

More Information.

Introducing a Delay

The first thing I tried was to introduce a delay between the successful init of the I2C bus and the first call to ulp_riscv_i2c_master_write_to_device. Even with delays as large as 1 second the behavior was the same. Most of the time the first write fails with an occasional success.

Looking at IDF Return Codes

Since the public API for the call to ulp_riscv_i2c_master_write_to_device returns a void I dug into the function itself and found the reason the "Write Failed" is printed is due to a failed call to ulp_riscv_i2c_wait_for_interrupt(ULP_RISCV_I2C_RW_TIMEOUT). Digging into that function I found that the error is not due to a timeout but due to the inspection of bits of the RTC_I2C_INT_ST_REG.

I added code to print the value of that register and found that in the successful runs it reads as 0x80 or 0x40 and in the failed runs it reads as 0x20.

The code implies that after reading the RTC_I2C_INT_ST_REG the bits it's inspecting are related to TX and RX data states. However, I was unable to see that same information in the ESP32 Technical Reference Manual. Here is the relevant section from the manual describing the RTC_I2C_INT_ST_REG.

The values of the bits in the register do not match the names of the bits being checked in the code.

At this point I've found the peripheral to be extremely stable after the first failure and would like more insight as to why the first write fails most of the time.

Unfortunately, since the public API for the write returns a void, it's not as simple as doing a retry which, in my experience, works 100% of the time. To work around this issue we simply write the first register twice. Another option would be to hand modify the IDF code to surface the underlying error but we would rather not have a custom build of the IDF.

o-marshmallow commented 4 months ago

Hello @sidwarkd ,

What you are looking at is the ESP32 Technical Reference Manual, not the ESP32-S3's. If you look at the ESP32-S3 Technical Reference Manual, taken from https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf , the RTC I2C interrupt status register matches what you see in the code:

According to the value of the status register during a failure, 0x20, the error encountered is RTC_I2C_ACK_ERR_INT_ST. In other words, the RTC I2C did not receive an acknowledgement from your device during the first transaction.

From what I see in the timing parameters you configured here https://github.com/sidwarkd/rtc_i2c_issue/blob/main/main/main.cpp#L55, you are targetting 400KHz, but I also see that you enabled internal pull-up resistors. Do you have any external pull-ups?

Your device also requires a ~1ms delay to boot. I wonder if the device expect an incoming clock to perform its boot, which would also explain why it is not responding for the first transaction.

Do you happen to have a logic analyzer to see what is going on on the bus?

sidwarkd commented 4 months ago

@o-marshmallow Thanks so much for the clarification on the technical manuals. Tackling each of these.

Yes, we do have external pullups. I modified the code to disable those and still see the exact same behavior. No change.

The 1ms delay to boot shouldn't be an issue here as I introduced an artificial 1s delay after I2C bus initialization at one point during troubleshooting which had no affect on the error. I couldn't find anything in the datasheet that implies the device needs clocking from the I2C bus to boot.

I do have a logic analyzer but due to the layout of the board it will take me some time to set up. I will post results here when I have them.

o-marshmallow commented 4 months ago

@sidwarkd Which values did you choose for the external pull-ups?

I couldn't find any other possible explanation from the device datasheet either.

Maybe you could try with an oscilloscope if it happens to be easier to probe (having test points on boards can always help :smile: )

sidwarkd commented 4 months ago

@o-marshmallow Ok, I have some more data to share here. After putting the logic analyzer on the device I found that some unexpected activity on the I2C bus generates a "start condition" which then messes up the first write. Subsequent operations go through just fine. At this point I started pairing down the code to a bare minimum and found some interesting things.

First, I created a main function that did nothing except print "Done". When powering the device on and off via USB power I got no spurious data. However, when I used idf.py monitor to reset the device and read logs I would get the equivalent of a start condition on the I2C bus every single time. I'm not sure if this is hardware related or not and would need to try on another device to validate. Mentioning it here only for reference as I found it interesting.

Next, I added just code to initialize the I2C bus which is contained in my init_i2c function. All it does is call ulp_riscv_i2c_master_init with the config. I found that this generates a start condition 100% of the time.

Next, I reintroduced my call to init_ens210 which does all of the writes to initialize the sensor. However, because of the "start" caused by ulp_riscv_i2c_master_init call it appears things get confused and the bus interprets a read command instead of a write.

This led me to an article about I2C bus locking. As well as another one that discusses the same thing. This made me think that the erroneous start was causing the ENS210 state machine to get stuck in a bad state waiting for a transaction to finish.

So next, I implemented a routine called reset_i2c_bus which clocks 16 pulses on the SCL line and then generates a stop condition. Using the GPIO interface didn't work (presumable because the I2C peripheral owns those pins at this point) so I implemented the function with a dummy call to an I2C device that doesn't exist on the bus at address 0x00. I call this immediately after the call to ulp_riscv_i2c_master_init.

    uint8_t data[2] = {0};
    ulp_riscv_i2c_master_set_slave_addr(0x00);
    ulp_riscv_i2c_master_set_slave_reg_addr(0x00);
    ulp_riscv_i2c_master_write_to_device(data, 2);

This results, as expected, in a "Write Failed!" debug log on the imaginary 0x00 device but then all of my ENS210 writes and reads work without issue.

That said, the Saleae analyzer, doesn't always show a proper start condition for I2C transactions. Here is an example of a write that succeeded but doesn't show the proper start (indicated by a green circle at the start of the transaction).

But a subsequent write does, in fact, have a proper start

About half of all transactions don't show timing of a proper start although the transaction succeeds and the device works as expected.

At this point I suppose I'm unblocked on the write failure using the hack to clear the bus which is required because of the unexpected "start" generated by initializing the ULP I2C peripheral.

It still feels like there is an issue with the I2C peripheral generating an unexpected start on the bus as well as not always having proper start conditions for transactions.

Questions

Is it expected that ulp_riscv_i2c_master_init generates a start condition on the I2C bus?
Is there a more proper way to reset the I2C bus?
Are proper start conditions expected to be generated by the ULP I2C peripheral?

Let me know if there is anything else you would like me to test or try.

sidwarkd commented 4 months ago

@o-marshmallow Also, you asked about pullup value which I failed to mention. There are 4.7k pullups on SDA and SCL.

sudeep-mohanty commented 4 months ago

Hi @sidwarkd, Thank you for sharing your detailed analysis with us. Here are some answers to your question -

However, when I used idf.py monitor to reset the device and read logs I would get the equivalent of a start condition on the I2C bus every single time.

This is because idf.py monitor always resets the target by default. It is quite possible that you may observe an I2C start condition when this happens. To avoid reseting the target, you may use the --no-reset option with idf.py monitor.

Is it expected that ulp_riscv_i2c_master_init generates a start condition on the I2C bus?

At this moment, I can't conclude why would you see a spurious start when you initialize the I2C. I shall investigate it further and let you know of any findings.

sidwarkd commented 4 months ago

@o-marshmallow Thanks for the quick reply, I look forward to your investigation results on the unexpected start. Would also be good to get clarification on the regular transactions, which work, but that don't appear to have proper start commands according to the logic analyzer capture. Thanks again for your help and investigation.

EDIT: After digging into the source of ulp_riscv_i2c.c it would appear that neither the read nor write functions for the ULP I2C driver send a proper start command despite the function documentation just above each function implying it should. My testing confirms that, at least in the case of the ENS210, this still works fine but want to verify that it's what is expected of the driver.

sudeep-mohanty commented 3 months ago

Hi @sidwarkd, I can confirm that I do notice a spurious I2C start condition when the I2C is initialized. Initial debugging points to order of how the I2C pins are initialized. I am working on a fix for it and would update you once I have some breakthrough.

After digging into the source of ulp_riscv_i2c.c it would appear that neither the read nor write functions for the ULP I2C driver send a proper start command despite the function documentation just above each function implying it should. My testing confirms that, at least in the case of the ENS210, this still works fine but want to verify that it's what is expected of the driver.

About the above comment, I don't see this happening at my end. I see well defined Start and Stop conditions being generated on the I2C bus, which is expected. Infact, had there not been such delimiters, the I2C transaction would most definitely fail as the sensor would never know when the transaction begins or ends. I would have to suspect that this maybe a logic-analyzer anomaly.

sidwarkd commented 3 months ago

@o-marshmallow Thanks for confirming this on the unexpected start. I'll look forward to updates as they are available.

As for the other start condition issue, I think you are right as the sensor works flawlessly and the signals appear really close together in the logic analyzer output (close to indicating a proper start I mean). I just looked at it again and my Saleae has a max sample rate of 500Ks/s which results in a resolution of 2us. In Fast Mode (400kHz) it definitely is pushing the limits of the resolution which I think explains what I'm seeing there.

Long-Zhang-dev commented 3 months ago

Hi， I found this problem too, but I notice it should be the rtc_gpio_init() problem, because in setting scl/sda pin, the default rtc gpio status is input without pullup, so before rtc_gpio_init, I try to set gpio pullup before init i2c(must set pull up first, then init gpio, to avoid the line act low), this problem will disaprear. here are my code ` rtc_gpio_pullup_en(PIN_NUM_I2C_SCL); rtc_gpio_pullup_en(PIN_NUM_I2C_SDA); rtc_gpio_pulldown_dis(PIN_NUM_I2C_SCL); rtc_gpio_pulldown_dis(PIN_NUM_I2C_SDA); rtc_gpio_set_direction(PIN_NUM_I2C_SCL, RTC_GPIO_MODE_INPUT_OUTPUT_OD); rtc_gpio_set_direction(PIN_NUM_I2C_SDA, RTC_GPIO_MODE_INPUT_OUTPUT_OD); rtc_gpio_init(PIN_NUM_I2C_SCL); rtc_gpio_init(PIN_NUM_I2C_SDA);

const ulp_riscv_i2c_cfg_t ulp_i2c = ULP_RISCV_I2C_DEFAULT_CONFIG();
ulp_riscv_i2c_master_init(&ulp_i2c);

`

sudeep-mohanty commented 3 months ago

Thanks for the analysis @Long-Zhang-dev and @sidwarkd. This is indeed a problem with the way the SCL and SDA pins are initialized. We need to set the output level high for the pin. And also as a precaution, we initialize the SCL pin before the SDA pin. With these changes, the spurious start condition is not seen.

We have a fix in progress which should be available to the master branch and relevant release branches once it passes through our tests. However, to unblock your work and to test the solution, please find attached a patch with the same fix here -

lp_i2c_spurious_start.patch

espressif / esp-idf