espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.63k stars 7.28k forks source link

ESP32-S3 I2C instability v ESP32 when using `i2c_master_xxx()` API (IDFGH-10122) #11397

Closed RobMeades closed 1 year ago

RobMeades commented 1 year ago

Answers checklist.

IDF version.

v5.0.1 (also occurs on master as at https://github.com/espressif/esp-idf/commit/54576b7528b182256c027de86eb605a172bc2821)

Operating System used.

Windows

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

CMD

Development Kit.

ESP32-S3-DevKitC-1 v1.0

Power Supply used.

USB

What is the expected behavior?

Code written for ESP-IDF I2C i2c_master_xxx() API and runs without error on ESP32 should also run without error when compiled for ESP32-S3.

What is the actual behavior?

[A few of your customers are seeing this problem, discussed here: https://www.esp32.com/viewtopic.php?f=2&t=33583 but we need attention from an Espressif person, hence moving it here.]

We have a GNSS device connected over I2C to an ESP32 (I2C HW block 0, 100 kHz clock, no clock stretching, SDA pin 18, SCL pin 19). Our automated test system checks that our code works perfectly all the time; the tests run for about 6 minutes. This works flawlessly.

However, when we build for and run on ESP32-S3 in exactly the same situation, we see instability: i2c_master_cmd_begin() sometimes returns error 263 (timeout) or occasionally -1 (no ack from device); it never passes the set of test.

To illustrate the problem, see below a Saleae Logic trace of an entire test run where the red line indicates that i2c_master_cmd_begin() is running, the orange and yellow blips indicate that a timeout error has been returned by i2c_master_cmd_begin() (orange for when the ESP32-S3 is doing a "send" to the device, yellow when the ESP32-S3 is doing a "receive" from the device) and the green blips indicate that "no ack from device" has been returned by i2c_master_cmd_begin():

image

Zooming in on the first of these errors (.sal file also attached below):

image

...you can see that a write has been set up to address 0x42 and the device has acked it (at least as far as the Saleae probe can tell) but then there is no clocking for the rest of the write, which I guess results in the timeout. All of the timeout errors on send, the yellow blips, look like this.

Zooming in on the second of these errors (.sal file also attached below):

image

...again, the Saleae probe believes that a read has been set up to address 0x42 and the first byte (which is all zeroes) has been received and acked but then the clock stops again and i2c_master_cmd_begin() returns timeout. Curiously, 16 ms later, and not while i2c_master_cmd_begin() is running, the clock returns high, which the slave probably sees as a signal to get on with things and pulls the data bus low, but then the clock stays high (I guess this is some sort of error recovery thing, rather than a clock edge). In some cases this confuses the devil out of the bus and results in a nack the next time. All of the timeout errors on receive, the orange blips, look like this.

The failures occur randomly, there is no discernable pattern to them.

For reference, here is a zoom into the same Saleae trace of a success case for send:

image

...and a success case for receive:

image

What could be causing this behaviour? Do we need to setup something differently to use the i2c_master_xxx() API on ESP32-S3? We've tried setting i2c_set_timeout() to the max value of 0x1f (see post here asking why the register has shrunk in size so much for ESP32-S3 versus ESP32) but, as expected, adding such a very short delay makes no difference to the outcome.

Steps to reproduce.

The entire code can be found here: https://github.com/u-blox/ubxlib/tree/debug_esp32s3_i2c, where this is the function that opens the I2C bus, this is the function that does a send, and this is the function that does a receive.

Also find attached segments of the Saleae probe trace for the send and receive cases (including the analogue waveform, which I didn't bother including above).

esp32s3_i2c_error_send.zip esp32s3_i2c_error_receive.zip

Debug Logs.

See Saleae logic probe snapshots and actual .sal trace segments above. The entire Saleae logic trace is also available but is too big to attach here (261 Mbytes): let me know if you want it.

More Information.

Just out of interest, I added some code to read the configuration values (the ones we don't change, assuming the defaults are good, plus the timeout value which we set on ESP32 but don't bother setting on ESP32-S3 because its range is too small to make any difference). Not sure if any of these matter - should we be changing them on ESP32-S3 for I2C to operate correctly?

ESP32:
clock period: high 387, low 399.
start setup time: 400, hold time 400.
stop setup time: 400, hold time 400.
data sample time: 200, hold time 200.
timeout: 800000 (we set this value, 10 ms).
ESP32-S3
clock period: high 200, low 200.
start setup time 199, hold time 200.
stop setup time 199, hold time 199.
data sample time 197, hold time 49 (tried changing this to 197, since that is closer to the ESP32 case, but it didn't help).
timeout 12 (== 150 ns, max on S3 is 31 == 386 ns).
o-marshmallow commented 1 year ago

Hello @RobMeades ,

To me the issue you are encoutnering sounds very similar to this one: https://github.com/espressif/esp-idf/issues/9777

In fact, we found a bug in the I2C HAL layer that also affected the S3, a fix was pushed on master branch and it is being backported to other branches: https://github.com/espressif/esp-idf/commit/a0f8434f9382016db6f3b102051569b5cbaea2c0

In the meanwhile, you can cherry-pick it and see if it solves your issue:

git cherry-pick a0f8434f9382016db6f3b102051569b5cbaea2c0
RobMeades commented 1 year ago

Hi, and thanks for the swift response. I had discounted that particular issue since there was no error on the I2C bus preceding these errors. However, I can try just doing the same thing on headrev of your master branch and see what happens, will report here on the outcome.

RobMeades commented 1 year ago

Unfortunately, moving to the current master (54576b7528b182256c027de86eb605a172bc2821) does not solve the problem, it still looks the same.

RobMeades commented 1 year ago

In case it helps, find below a high resolution trace (with your master branch), including analogue @ 50 M samples/second, taken at the pins of the ESP32-S3 Devkit, of the parts where the errors occur.

Here is the "sending to device" case, i.e. zooming into here:

image

image

...and here is the "receiving from device" case, i.e. zooming into here:

image

image

Corresponding Saleae logic traces also attached: esp32s3_i2c_error_send_detail.zip and esp32s3_i2c_error_receive_detail.zip.

Looks pretty clean, electrically speaking, to me.

o-marshmallow commented 1 year ago

@RobMeades Thank you for all the details and screenshots. I tried on my side with your send function, the same SDA/SCL pin, same frequency and with internal pull-ups (no external pull-ups) and a GPIO expander device, commit https://github.com/espressif/esp-idf/commit/54576b7528b182256c027de86eb605a172bc2821 but I couldn't reproduce the issue unfortunately. In my case, the transfers are all successful, even if I execute them with tight interval: Screenshot from 2023-05-15 18-54-51

I suspect it's a timing configuration issue because, as you said, electrically speaking it is good, the signals seem clean to me. Where did you print/get the timing values you showed in your previous message? These ones:

ESP32-S3
clock period: high 200, low 200.
start setup time 199, hold time 200.
stop setup time 199, hold time 199.
data sample time 197, hold time 49 (tried changing this to 197, since that is closer to the ESP32 case, but it didn't help).
timeout 12 (== 150 ns, max on S3 is 31 == 386 ns).

The data sample time seems too big, may be due to the fact that you retrieved them with the older invalid commit. For comparison, and make sure we are using the same values, here are the detailed timings I am using:

scl_low: 200
scl_wait_high: 98
scl_high: 102
sda_hold: 50
sda_sample: 100
setup: 200
hold: 200
tout: 12

To get them, here is the diff to apply: esp32s3_i2c_timings.diff.txt

Check the timings you get and make sure we have the same configuration.

If they are the same and you still get the same problem, try to isolate the issue with a small ESP-IDF example that uses the I2C API wrappers we provide, something like:

#include <stdio.h>
#include <inttypes.h>
#include "sdkconfig.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/i2c.h"

#define I2C_MASTER_PORT 0
#define DEVICE_ADDRESS 0x42 // un-shifted address!

static esp_err_t i2c_master_init(void)
{
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = 18,
        .scl_io_num = 19,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = 100000,
    };

    i2c_param_config(I2C_MASTER_PORT, &conf);

    return i2c_driver_install(I2C_MASTER_PORT, conf.mode, 0, 0, 0);
}

void app_main(void)
{
    uint8_t data[1] = { 0 };
    uint8_t datar[1] = { 0 };
    esp_err_t ret = ESP_OK;

    ret = i2c_master_init();
    if (ret != ESP_OK) {
        printf("Error initializing the I2C\n");
        return;
    }

    while (1) {
        ret = i2c_master_write_read_device(I2C_MASTER_PORT, DEVICE_ADDRESS, data, 1, datar, 1, 1000 / portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            printf("Error %d sending data\n", ret);
            return;
        }
        printf("Received: %x\n", *datar);

        vTaskDelay(250 / portTICK_PERIOD_MS);
    }

}

As a side note, your send and receive functions can be replaced by i2c_master_write_to_device and i2c_master_read_from_device respectively. In fact, they will even be faster because they don't perform any dynamic memory allocation.

RobMeades commented 1 year ago

Yes, the timing values I posted in the "More Information" section above were retrieved on the 5.0.1 release, done by issuing i2c_get_period(), i2c_get_start_timing(), i2c_get_stop_timing(), i2c_get_data_timing() and i2c_get_timeout().

As you can see from the first trace above, the error occurs about ~20 times in what must be several thousand transactions of various types over a 6 minute period, the real scenario; without knowing the exact circumstances that cause the problem, I suspect that using a simple small example, even repeated many times, won't reproduce it (though thank you for trying!).

Is there any instrumentation I can add to the code I am running which would print out stuff of interest to you when the problem occurs? State information, that kind of thing?

I will also look into your i2c_master_write_to_device() and i2c_master_read_from_device() functions for the future.

EDIT: seems that I am using the same values as you when on master (https://github.com/espressif/esp-idf/commit/54576b7528b182256c027de86eb605a172bc2821); this is what I get from adding your esp_rom_printf() to i2c_ll_cal_bus_clk():

scl_low: 200
scl_wait_high: 98
scl_high: 102
sda_hold: 50
sda_sample: 100
setup: 200
hold: 200
tout: 12

If I repeat the calls to i2c_get_xxx() I did earlier I get:

i2c_get_period() returned high_period 200, low_period 200. i2c_get_start_timing() returned setup_time 199, hold_time 200. i2c_get_stop_timing() returned setup_time 199, hold_time 199. i2c_get_data_timing() returned sample_time 99, hold_time 49. i2c_get_timeout() returned 12.

...so the data sample time parameter has certainly halved since the 5.0.1 release, but the change has not helped with my problem I'm afraid.

AndreasKohn commented 1 year ago

Hi, @o-marshmallow just to mention that in this forum entry: https://www.esp32.com/viewtopic.php?f=2&t=33583 it is also mentioned that I2C code running fine on an ESP32 does not run on a ESP-C2 devboard. Here I have the impression that the length of the data send over I2C creates the problem. Let me please now if we can handle this problem in this issue as well, or if I should open up a new ticket. So far I tries all the recommendation from above and switched to 5.0.2, without any luck to solve the problem. Thanks

RobMeades commented 1 year ago

Seems I was wrong. I tried your example, using i2c_master_write_read_device() just as you have, modified to read two bytes back rather than one (since that's what our device (u-blox GNSS module, any type) wants to do) and to use pin 12 as a logic analyzer trigger on failure: it fails pretty much immediately, on the first or second go around the loop with error 263 (timeout). This is still using master (https://github.com/espressif/esp-idf/commit/54576b7528b182256c027de86eb605a172bc2821).

Here is the code:

#include <stdio.h>
#include <inttypes.h>
#include "sdkconfig.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/i2c.h"
#include "driver/gpio.h"

#define I2C_MASTER_PORT 0
#define DEVICE_ADDRESS 0x42 // un-shifted address!
#define TRIGGER_PIN 12

static esp_err_t i2c_master_init(void)
{
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = 18,
        .scl_io_num = 17, // I got the pin number wrong when I wrote up this issue, clock is actually on pin 17 not pin 19
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = 100000,
    };

    i2c_param_config(I2C_MASTER_PORT, &conf);

    return i2c_driver_install(I2C_MASTER_PORT, conf.mode, 0, 0, 0);
}

void app_main(void)
{
    uint8_t data[1] = { 0xFD }; // This is the register that returns the number of bytes available, in two bytes, when read
    uint8_t datar[2] = { 0 };
    int32_t x = 0;
    esp_err_t ret = ESP_OK;
    gpio_config_t config;

    // Set up trigger pin for logic capture
    config.intr_type = GPIO_INTR_DISABLE;
    config.mode = GPIO_MODE_OUTPUT;
    config.pull_down_en = GPIO_PULLDOWN_DISABLE;
    config.pull_up_en = GPIO_PULLUP_DISABLE;
    config.pin_bit_mask = 1ULL << TRIGGER_PIN;
    ret = gpio_config(&config);
    if (ret != ESP_OK) {
        printf("Error initializing pin(s) 0x%08llx\n", config.pin_bit_mask);
        return;
    }
    gpio_set_level(TRIGGER_PIN, 0);

    ret = i2c_master_init();
    if (ret != ESP_OK) {
        printf("Error initializing the I2C\n");
        return;
    }

    while (1) {
        ret = i2c_master_write_read_device(I2C_MASTER_PORT, DEVICE_ADDRESS, data, 1, datar, 2, 1000 / portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            gpio_set_level(TRIGGER_PIN, 1);
            printf("Error %d sending data on attempt %ld\n", ret, x + 1);
            return;
        }
        x++;
        printf("Received: 0x%02x%02x\n", *datar, *(datar + 1));

        vTaskDelay(250 / portTICK_PERIOD_MS);
    }
}

...here is what it printed:

scl_low: 200
scl_wait_high: 98
scl_high: 102
sda_hold: 50
sda_sample: 100
setup: 200
hold: 200
tout: 12
Received: 0x0000
Error 263 sending data on attempt 2

...and here is what the logic analyzer shows:

image

Zooming-in on the first, successful case:

image

...that looks good, and zooming-in on the second, fail case:

image

...it seems to have just stopped clocking after sending the device address for the read part; the Saleae probe believes there has been an ack.

I tried removing my GPIO stuff, in case toggling pin 12 was somehow having an adverse effect, but that made no difference to the outcome. On other runs I also saw it stop after sending the device address for the write:

image

Unh? Just to be completely clear, if I switch back to my original app_main(), without touching the HW, I get the behaviour I saw when I originally raised this issue, i.e. 20ish errors in our 6 minute test. The difference with your i2c_master_write_read_device() function is, as you say, that it is faster, because of the static allocation.

Find attached a ZIP file containing .sal files for the success case, the fail on read case and the fail on write case: esp32s3_i2c_simple_example.zip.

EDIT: if I keep retrying, pressing the reset button on the board (I've done it maybe 100 times now), very occasionally (e.g. one in 20 times) it goes around the loop 3 or 4 times, once it has gone around the loop 64 times and once, just now, I saw it go around the loop 140 times, but the other 93% of the time it only went around the loop one or two times; the error is 263 in all cases.

o-marshmallow commented 1 year ago

Hi @RobMeades ,

I am not sure why using i2c_master_write_read_device would trigger the error more frequenctly than the former version. Maybe the fact that it is faster gives less time to the I2C controller to reset itself from unknown state?

Anyway, I am still looking at the timing configuration, checking each value. In the Technical Reference Manual, the ESP32-S3 I2C hardware component expects that scl_wait_high < sda_sample_time < scl_high period. This is actually the case with our values but it seems a bit tight, they are only one or two clock cycles apart.

I2C timing specification states that scl_wait_high equivalent must be of at most 1000 ns, so we can try making it smaller in the HAL layer, and thus delay sda_sample_time. As such, each of the data I mentioned above will be several clock cyclesa part from eachother.

Another thing that I have noticed is that i2c_stretch_protect_num is not set, and according to its description, I feel we may need it: Configures the time period to release the SCL line from stretching to avoid timing violation. Usually it should be larger than the SDA setup time. Even though we don't have clock stretching, this should "force" the SCL line to go back to high after a certain delay from what I understand.

Overall, here is the diff to apply these changes (including the esp_rom_printf previously mentionned): i2c.diff.txt

I tested this on my end, with the new SCL pin you gave, and with more receive data, and it still works on my end: Screenshot from 2023-05-16 13-55-48

Here is the output on init:

scl_low: 200
scl_wait_high: 64
scl_high: 136
sda_hold: 50
sda_sample: 100
setup: 200
hold: 200
tout: 12

If all of that doesn't work, the problem may not be in the configuration. How many devices have you connected to the I2C bus? Do you have any external pull-up resistors on your board? If not, can you try adding 4.7KOhm pull-up resistors on both SCL and SDA?

RobMeades commented 1 year ago

Thanks for continuing to try: no change with that patch I'm afraid, same failure patterns/statistics, and the output on init is:

scl_low: 200
scl_wait_high: 64
scl_high: 136
sda_hold: 50
sda_sample: 100
setup: 200
hold: 200
tout: 12

The board we are using (there is only one thing on the I2 bus) already has 2.2K pull-ups on the SDA and SCL lines, so that shouldn't be an issue (I've actually tried it with and without the pull-ups) EDIT: in case of cable issues, I've just made two very short SCL/SDA cables, each length 4 cm, but that hasn't helped I'm afraid [and anyway the board we originally found this issue on was a proper PCB]: i2c_schematic i2c

Anything else I can try?

RobMeades commented 1 year ago

Hold on a sec': since we've never tried using your i2c_master_write_read_device() functions before, I just switched back to an ESP32 (rather than ESP32-S3) board and, with this same simple example, it shows exactly the same error [and yet, of course, it shows not a single I2C error in our 6 minute long test (I retried that also with exactly the same code-base)].

Let me (a) try your most recent patch above on ESP32-S3 but with the 6 minute test (that uses the underlying I2C functions rather than the wrapper) and, if that doesn't work, (b) write a simple example using the underlying I2C functions that I know works on ESP32 (non-S3) and try that out again on ESP32-S3.

EDIT: (a) shows the same results as before I'm afraid, setting up (b) now...

RobMeades commented 1 year ago

Oh, and @o-marshmallow, don't forget @AndreasKohn's question which might have been lost between my ramblings above.

RobMeades commented 1 year ago

Sorted things out now: the I2C device needed to be reset and allowed to settle, something our core code does and the simple example needed to be made to do.

Below is the simple example code but, of course, it runs fine, doesn't show the problem: when the GNSS chip is "just ticking over" like this there are only a few hundred bytes of data to be read every second; there is a lot more variability/loading in the 6 minutes of regression tests that we run routinely.

It seems quite likely to me that, since the error we are seeing is a timeout, and the maximum tout value has gone from more than 10 milliseconds on ESP32 (we set 10 ms) to 0.387 microseconds (i.e. ~30 times smaller) on ESP32-S3, that our problem is that there are situations (maybe around device power-up) where 0.387 microseconds is just not long enough, the ESP32-S3 is timing-out when the I2C device is responsive, has received the data it was sent or will send the data it was asked for, just cannot always be that fast.

Is there any way to make tout larger than 0.387 microseconds, or is this a chip limitation?

#include <stdio.h>
#include <inttypes.h>
#include "sdkconfig.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "driver/i2c.h"
#include "driver/gpio.h"

#define I2C_MASTER_PORT 0
#define TIMEOUT_APB_TICKS 0x1f // 800000 == 10 ms
#define DEVICE_ADDRESS 0x42 // un-shifted address!
#define REGISTER_ADDRESS 0xFD
#define TRIGGER_PIN 12
#define DEVICE_RESET_BAR_PIN 19

#define ADDRESS_WRITE(__ADDRESS__) (((__ADDRESS__) << 1) | I2C_MASTER_WRITE)
#define ADDRESS_READ(__ADDRESS__) (((__ADDRESS__) << 1) | I2C_MASTER_READ)

static esp_err_t i2c_master_init(void)
{
    i2c_config_t conf = {
        .mode = I2C_MODE_MASTER,
        .sda_io_num = 18,
        .scl_io_num = 17,
        .sda_pullup_en = GPIO_PULLUP_ENABLE,
        .scl_pullup_en = GPIO_PULLUP_ENABLE,
        .master.clk_speed = 100000,
    };

    i2c_param_config(I2C_MASTER_PORT, &conf);

    return i2c_driver_install(I2C_MASTER_PORT, conf.mode, 0, 0, 0);
}

void app_main(void)
{
    uint8_t data[1] = { REGISTER_ADDRESS };
    uint8_t datar[128] = { 0 };
    int x = 0;
    int y;
    int z = 0;
    esp_err_t ret = ESP_OK;
    gpio_config_t config;

    gpio_set_level(TRIGGER_PIN, 0);
    gpio_set_level(DEVICE_RESET_BAR_PIN, 1);
    config.intr_type = GPIO_INTR_DISABLE;
    config.mode = GPIO_MODE_OUTPUT;
    config.pull_down_en = GPIO_PULLDOWN_DISABLE;
    config.pull_up_en = GPIO_PULLUP_DISABLE;
    config.pin_bit_mask = (1ULL << TRIGGER_PIN) | (1ULL << DEVICE_RESET_BAR_PIN);
    ret = gpio_config(&config);
    if (ret != ESP_OK) {
        printf("Error initializing pin(s) 0x%08llx\n", config.pin_bit_mask);
        return;
    }

    printf("Resetting i2C device...\n");
    gpio_set_level(DEVICE_RESET_BAR_PIN, 0);
    vTaskDelay(500 / portTICK_PERIOD_MS);
    gpio_set_level(DEVICE_RESET_BAR_PIN, 1);

    printf("Waiting to start...\n");
    vTaskDelay(3000 / portTICK_PERIOD_MS);

    ret = i2c_master_init();
    if (ret != ESP_OK) {
        printf("Error initializing the I2C\n");
        return;
    }

    ret = i2c_set_timeout(I2C_MASTER_PORT, TIMEOUT_APB_TICKS);
    if (ret != ESP_OK) {
        printf("WARNING: attempt to set timeout to %d APB ticks returned %d\n", TIMEOUT_APB_TICKS, ret);
    }
    ret = i2c_get_timeout(I2C_MASTER_PORT, &y);
    if (ret != ESP_OK) {
        printf("Unable to get I2C timeout\n");
        return;
    }
    printf("I2C timeout is now %d APB ticks\n", y);

    while (1) {
        // Send one byte with no stop, then receive two bytes back again, the last byte of which should be NACKed
        ret = i2c_master_write_read_device(I2C_MASTER_PORT, DEVICE_ADDRESS, data, 1, datar, 2, 1000 / portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            gpio_set_level(TRIGGER_PIN, 1);
            printf("Error %d exchanging data on attempt %d (%d byte(s) read in total)\n", ret, (x + 1), z);
            return;
        }
        z += 2;
        y = (int) ((((uint32_t) datar[0]) << 8) + datar[1]);
        if (y > 0){
            // Data bytes can now be read from address 0xFF, which the device will have auto-incremented to
            if (y > sizeof(datar)) {
                y = sizeof(datar);
            }
            ret = i2c_master_read_from_device(I2C_MASTER_PORT, DEVICE_ADDRESS, datar, y, 1000 / portTICK_PERIOD_MS);
            if (ret != ESP_OK) {
                gpio_set_level(TRIGGER_PIN, 1);
                printf("Error %d reading data on attempt %d (%d byte(s) read in total)\n", ret, (x + 1), z);
                return;
            }
            z += y;
        }
        if (x % 1000 == 0) {
            printf("Still running, %d byte(s) read in total\n", z);
        }
        x++;
    }
}
o-marshmallow commented 1 year ago

@AndreasKohn Sorry for the late reply, yes you should create a new issue (I see that you did that already, thank you!)

@RobMeades How did you calculate the 0.387 microseconds timeout?

The value set as a timeout is a power of two of SCLK clock cycles. In other words, if the timeout is set to 12, it is actually equal to 2^12 * SCLK_PERIOD, in current case, the SCLK_PERIOD is 25ns (40MHz). So the timeout is in fact equal to 102.4 microseconds.

As stated in the technical reference manual, the maximum value for timeout is 22, which is equal to 2^22 * 25ns = 104.9ms. If you want to have around 10ms of timeout, set this value to 19.

RobMeades commented 1 year ago

Oh! I had read the ESP-IDF documentation as saying that the value is in 80 MHz APB clock cycles. If it is in a power of two of 40 MHz that makes rather a large difference :-).

I have been changing a lot of things at once, and in different scenarios, over the last two days; in particular realising that the device I am talking to over I2C is a lot slower to respond in the first few seconds after power on: it is a complicated beast and I guess is booting etc. rather busily to begin with; it is around here that most of the timeouts are occurring (we power the device up and down many times during the 6 minute test). That, and I have been tweaking many variables on the ESP32S3 side. Since I had thought the timeout value was so small as to be useless, I had left it at the default of 12. Last night I ran the 6 minute test with the timeout set to what I perceived to be the maximum value of to 0x1F again, having checked i2c_reg.h for ESP32S3 (versus i2c_reg.h for for ESP32); the i2c_set_timeout() function permits this value to be set: where would I find the limit of 22 in the documentation? Anyway, I had no errors which, with your explanation above, makes a lot of sense.

Summary: I think we're nearly there, thank you for your expert attention, I would just like to make sure I have references to the bits of ESP-IDF documentation where the value set using i2c_set_timeout() is accurately defined.

o-marshmallow commented 1 year ago

I understand the confusion about the timeout. On the ESP32, the I2C was clocked by APB indeed, which is 80MHz: image (ESP32 Technical Reference Manual page 295)

But on the ESP32-S3, the I2C SCL_FSM (responsible for the SCL signal) is clocked by the SCLK clock: image (ESP32-S3 Technical Reference Manual page 954)

That SCLK clock can be either sourced from XTAL clock (40MHz) or from the RTC Fast Clock (17.5MHz). The resulted frequency is calculated as followed: image (ESP32-S3 Technical Reference Manual page 956)

In the previous examples, the divider was 1, so the SCLK frequency was the same as the source one, which was 40MHz XTAL. The divider was also calculated in i2c_ll_cal_bus_clk function.

I start wondering whether it could be your device that does clock stretching (pulls the SCL line low) in order to tell the ESP32-S3 to slow down on the transfer because it (device) is busy treating the previous command. If you are seeing this issue only when stressing the bus, I think it could make sense.

RobMeades commented 1 year ago

That's excellent stuff: the other critical thing here, I think, is whether the value is a multiplier of the clock or a power-of-two of the clock: I've been reading the ESP32 technical document and the ESP32S3 technical document, plus the ESP-IDF description of the i2c_xxx_timeout() functions and I think all might benefit from some clarification. Just to keep it simple:

o-marshmallow commented 1 year ago

Correct me if my link is wrong but I think I Just found the Integration manual for your device: https://content.u-blox.com/sites/default/files/MAX-M10S_IntegrationManual_UBX-20053088.pdf

It is clearly stated that the device will perform clock stretching when the inner CPU is busy servicing an interrupt: image

When you perform a lot of transactions one after the other, the probability that the CPU will be servicing an interrupt while an I2C transfer occurs is very high, mainly at 100KHz (because the transfer last longer). This is why you see clock stretching and thus, get a timeout.

RobMeades commented 1 year ago

Indeed, so what it comes down to is that, based on my experience with ESP32, I had taken the ESP-IDF documentation literally:

image

On the face of it, there is no difference for ESP32S3, except that i2c_set_timeout() returns an error when I pass it the value I used with ESP32. Checking those header files, I saw that the register had become only 5-bits wide, hence I asked this question on the forum as to why the value is now so tiny as to be useless, but there was no answer. Even the ESP32-S3 documentation says:

image

The key missing phrase, I think, is "power of two"!

o-marshmallow commented 1 year ago

@RobMeades Good catch! This is indeed a mistake in the ESP32-S3 TRM, not only the clock source is wrong but the power of two is missing. The TRM talks about the power of two but for the FSM timeout (I2C_SCL_ST _TO_I2C). I will report this error internally.

The function i2c_ll_set_tout comments/documentation is correct:

/**
 * @brief  Configure I2C timeout
 *
 * @param  hw Beginning address of the peripheral registers
 * @param  tout_num The I2C timeout value needs to be set (2^tout in core clock cycle)
 *
 * @return None
 */
static inline void i2c_ll_set_tout(i2c_dev_t *hw, int tout)
{
    hw->to.time_out_value = tout;
}

Else you are right, a 5-bit value wouldn't make sense as a timeout, the maximum would have been way too small :smile:

RobMeades commented 1 year ago

Perfect: last question then - is there a way to determine, at ESP-IDF level, the current I2C clock?

EDIT: I see the i2c_config_t.clk_flags field, where the possible flags appear to be I2C_SCLK_SRC_FLAG_FOR_NOMAL, I2C_SCLK_SRC_FLAG_AWARE_DFS and I2C_SCLK_SRC_FLAG_LIGHT_SLEEP: which of these is XTAL versus RC_FAST_CLK, or is there somewhere else I should be looking?

Guessing that an RC network is the RTC clock source and hence, if you set I2C_SCLK_SRC_FLAG_AWARE_DFS, that likely selects RC_FAST_CLK instead of XTAL, since the clock source for the RTC is the only clock source that won't change with sleep, but that is a complete guess...?

o-marshmallow commented 1 year ago

@RobMeades You guessed right, the clock is selected based on the clk_flags you pass when configuring the I2C bus. Manually selecting the source clock would cause incompatibility between IDF programs because some clocks are not available on all targets.

The flag that correpsonds to the RC clock is I2C_SCLK_SRC_FLAG_LIGHT_SLEEP, as you said, it frequency doesn't change, even in light sleep. Even though RC clock also presents I2C_SCLK_SRC_FLAG_AWARE_DFS flag, this flag is mainly used to select REF_TICK as a clock source (only supported on S2 as far as I know)

RobMeades commented 1 year ago

Got it, thanks again, closing this issue.