Closed sidwarkd closed 3 months ago
Hello @sidwarkd ,
What you are looking at is the ESP32 Technical Reference Manual, not the ESP32-S3's. If you look at the ESP32-S3 Technical Reference Manual, taken from https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf , the RTC I2C interrupt status register matches what you see in the code:
According to the value of the status register during a failure, 0x20
, the error encountered is RTC_I2C_ACK_ERR_INT_ST
. In other words, the RTC I2C did not receive an acknowledgement from your device during the first transaction.
From what I see in the timing parameters you configured here https://github.com/sidwarkd/rtc_i2c_issue/blob/main/main/main.cpp#L55, you are targetting 400KHz, but I also see that you enabled internal pull-up resistors. Do you have any external pull-ups?
Your device also requires a ~1ms delay to boot. I wonder if the device expect an incoming clock to perform its boot, which would also explain why it is not responding for the first transaction.
Do you happen to have a logic analyzer to see what is going on on the bus?
@o-marshmallow Thanks so much for the clarification on the technical manuals. Tackling each of these.
Yes, we do have external pullups. I modified the code to disable those and still see the exact same behavior. No change.
The 1ms delay to boot shouldn't be an issue here as I introduced an artificial 1s delay after I2C bus initialization at one point during troubleshooting which had no affect on the error. I couldn't find anything in the datasheet that implies the device needs clocking from the I2C bus to boot.
I do have a logic analyzer but due to the layout of the board it will take me some time to set up. I will post results here when I have them.
@sidwarkd Which values did you choose for the external pull-ups?
I couldn't find any other possible explanation from the device datasheet either.
Maybe you could try with an oscilloscope if it happens to be easier to probe (having test points on boards can always help :smile: )
@o-marshmallow Ok, I have some more data to share here. After putting the logic analyzer on the device I found that some unexpected activity on the I2C bus generates a "start condition" which then messes up the first write. Subsequent operations go through just fine. At this point I started pairing down the code to a bare minimum and found some interesting things.
First, I created a main
function that did nothing except print "Done". When powering the device on and off via USB power I got no spurious data. However, when I used idf.py monitor
to reset the device and read logs I would get the equivalent of a start condition on the I2C bus every single time. I'm not sure if this is hardware related or not and would need to try on another device to validate. Mentioning it here only for reference as I found it interesting.
Next, I added just code to initialize the I2C bus which is contained in my init_i2c
function. All it does is call ulp_riscv_i2c_master_init
with the config. I found that this generates a start condition 100% of the time.
Next, I reintroduced my call to init_ens210
which does all of the writes to initialize the sensor. However, because of the "start" caused by ulp_riscv_i2c_master_init
call it appears things get confused and the bus interprets a read command instead of a write.
This led me to an article about I2C bus locking. As well as another one that discusses the same thing. This made me think that the erroneous start was causing the ENS210 state machine to get stuck in a bad state waiting for a transaction to finish.
So next, I implemented a routine called reset_i2c_bus
which clocks 16 pulses on the SCL line and then generates a stop condition. Using the GPIO interface didn't work (presumable because the I2C peripheral owns those pins at this point) so I implemented the function with a dummy call to an I2C device that doesn't exist on the bus at address 0x00. I call this immediately after the call to ulp_riscv_i2c_master_init
.
uint8_t data[2] = {0};
ulp_riscv_i2c_master_set_slave_addr(0x00);
ulp_riscv_i2c_master_set_slave_reg_addr(0x00);
ulp_riscv_i2c_master_write_to_device(data, 2);
This results, as expected, in a "Write Failed!" debug log on the imaginary 0x00 device but then all of my ENS210 writes and reads work without issue.
That said, the Saleae analyzer, doesn't always show a proper start condition for I2C transactions. Here is an example of a write that succeeded but doesn't show the proper start (indicated by a green circle at the start of the transaction).
But a subsequent write does, in fact, have a proper start
About half of all transactions don't show timing of a proper start although the transaction succeeds and the device works as expected.
At this point I suppose I'm unblocked on the write failure using the hack to clear the bus which is required because of the unexpected "start" generated by initializing the ULP I2C peripheral.
It still feels like there is an issue with the I2C peripheral generating an unexpected start on the bus as well as not always having proper start conditions for transactions.
ulp_riscv_i2c_master_init
generates a start condition on the I2C bus?Let me know if there is anything else you would like me to test or try.
@o-marshmallow Also, you asked about pullup value which I failed to mention. There are 4.7k pullups on SDA and SCL.
Hi @sidwarkd, Thank you for sharing your detailed analysis with us. Here are some answers to your question -
However, when I used idf.py monitor to reset the device and read logs I would get the equivalent of a start condition on the I2C bus every single time.
This is because idf.py monitor
always resets the target by default. It is quite possible that you may observe an I2C start condition when this happens. To avoid reseting the target, you may use the --no-reset
option with idf.py monitor
.
Is it expected that ulp_riscv_i2c_master_init generates a start condition on the I2C bus?
At this moment, I can't conclude why would you see a spurious start when you initialize the I2C. I shall investigate it further and let you know of any findings.
@o-marshmallow Thanks for the quick reply, I look forward to your investigation results on the unexpected start. Would also be good to get clarification on the regular transactions, which work, but that don't appear to have proper start commands according to the logic analyzer capture. Thanks again for your help and investigation.
EDIT: After digging into the source of ulp_riscv_i2c.c it would appear that neither the read nor write functions for the ULP I2C driver send a proper start command despite the function documentation just above each function implying it should. My testing confirms that, at least in the case of the ENS210, this still works fine but want to verify that it's what is expected of the driver.
Hi @sidwarkd, I can confirm that I do notice a spurious I2C start condition when the I2C is initialized. Initial debugging points to order of how the I2C pins are initialized. I am working on a fix for it and would update you once I have some breakthrough.
After digging into the source of ulp_riscv_i2c.c it would appear that neither the read nor write functions for the ULP I2C driver send a proper start command despite the function documentation just above each function implying it should. My testing confirms that, at least in the case of the ENS210, this still works fine but want to verify that it's what is expected of the driver.
About the above comment, I don't see this happening at my end. I see well defined Start and Stop conditions being generated on the I2C bus, which is expected. Infact, had there not been such delimiters, the I2C transaction would most definitely fail as the sensor would never know when the transaction begins or ends. I would have to suspect that this maybe a logic-analyzer anomaly.
@o-marshmallow Thanks for confirming this on the unexpected start. I'll look forward to updates as they are available.
As for the other start condition issue, I think you are right as the sensor works flawlessly and the signals appear really close together in the logic analyzer output (close to indicating a proper start I mean). I just looked at it again and my Saleae has a max sample rate of 500Ks/s which results in a resolution of 2us. In Fast Mode (400kHz) it definitely is pushing the limits of the resolution which I think explains what I'm seeing there.
Hi, I found this problem too, but I notice it should be the rtc_gpio_init() problem, because in setting scl/sda pin, the default rtc gpio status is input without pullup, so before rtc_gpio_init, I try to set gpio pullup before init i2c(must set pull up first, then init gpio, to avoid the line act low), this problem will disaprear. here are my code ` rtc_gpio_pullup_en(PIN_NUM_I2C_SCL); rtc_gpio_pullup_en(PIN_NUM_I2C_SDA); rtc_gpio_pulldown_dis(PIN_NUM_I2C_SCL); rtc_gpio_pulldown_dis(PIN_NUM_I2C_SDA); rtc_gpio_set_direction(PIN_NUM_I2C_SCL, RTC_GPIO_MODE_INPUT_OUTPUT_OD); rtc_gpio_set_direction(PIN_NUM_I2C_SDA, RTC_GPIO_MODE_INPUT_OUTPUT_OD); rtc_gpio_init(PIN_NUM_I2C_SCL); rtc_gpio_init(PIN_NUM_I2C_SDA);
const ulp_riscv_i2c_cfg_t ulp_i2c = ULP_RISCV_I2C_DEFAULT_CONFIG();
ulp_riscv_i2c_master_init(&ulp_i2c);
`
Thanks for the analysis @Long-Zhang-dev and @sidwarkd. This is indeed a problem with the way the SCL and SDA pins are initialized. We need to set the output level high for the pin. And also as a precaution, we initialize the SCL pin before the SDA pin. With these changes, the spurious start condition is not seen.
We have a fix in progress which should be available to the master branch and relevant release branches once it passes through our tests. However, to unblock your work and to test the solution, please find attached a patch with the same fix here -
Answers checklist.
IDF version.
v5.1.1
Espressif SoC revision.
ESP32-S3
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
Custom Board
Power Supply used.
USB
What is the expected behavior?
ulp_riscv_i2c_master_write_to_device
should properly write to the I2C device on the first write attempt.What is the actual behavior?
Most of the time the first call to
ulp_riscv_i2c_master_write_to_device
fails with all subsequent calls working just fine. The following is produced in the output.Steps to reproduce.
I have created a Github repository that is a minimal sample that allows you to recreate the issue. In this case all you need is an ESP32-S3 development board with an ENS210 sensor connected to the I2C bus. Then simply build and run the project found at https://github.com/sidwarkd/rtc_i2c_issue.
This issue was first reported in #11037
Debug Logs.
More Information.
Introducing a Delay
The first thing I tried was to introduce a delay between the successful init of the I2C bus and the first call to
ulp_riscv_i2c_master_write_to_device
. Even with delays as large as 1 second the behavior was the same. Most of the time the first write fails with an occasional success.Looking at IDF Return Codes
Since the public API for the call to
ulp_riscv_i2c_master_write_to_device
returns a void I dug into the function itself and found the reason the "Write Failed" is printed is due to a failed call toulp_riscv_i2c_wait_for_interrupt(ULP_RISCV_I2C_RW_TIMEOUT)
. Digging into that function I found that the error is not due to a timeout but due to the inspection of bits of theRTC_I2C_INT_ST_REG
.I added code to print the value of that register and found that in the successful runs it reads as 0x80 or 0x40 and in the failed runs it reads as 0x20.
The code implies that after reading the
RTC_I2C_INT_ST_REG
the bits it's inspecting are related to TX and RX data states. However, I was unable to see that same information in the ESP32 Technical Reference Manual. Here is the relevant section from the manual describing theRTC_I2C_INT_ST_REG
.The values of the bits in the register do not match the names of the bits being checked in the code.
At this point I've found the peripheral to be extremely stable after the first failure and would like more insight as to why the first write fails most of the time.
Unfortunately, since the public API for the write returns a void, it's not as simple as doing a retry which, in my experience, works 100% of the time. To work around this issue we simply write the first register twice. Another option would be to hand modify the IDF code to surface the underlying error but we would rather not have a custom build of the IDF.