Multiple read command in a single I2C command link times out or return garbage without error. (IDFGH-8147)

espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.

Apache License 2.0

13.78k stars 7.31k forks source link

Multiple read command in a single I2C command link times out or return garbage without error. (IDFGH-8147) #9644

Open X-Ryl669 opened 2 years ago

X-Ryl669 commented 2 years ago

Answers checklist.

[X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
[X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
[X] I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

master

Operating System used.

macOS

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

No response

Development Kit.

ESP32 WROVER

Power Supply used.

USB

What is the expected behavior?

I'm creating a static i2c_cmd_buffer where I'm preparing a transaction with 2 successive reads, like this:

#define FilterError(x) ((x) == ESP_OK)

uint8 buffer[I2C_LINK_RECOMMENDED_SIZE(4)]; // Because 2 read transaction as documented
i2c_cmd_handle_t cmd = i2c_cmd_link_create_static(buffer, I2C_LINK_RECOMMENDED_SIZE(4));
uint8 devAddr = 0x76;  // For BMP210
uint8 regAddr[] = { 0xFA, 0xF7 };
uint8 dataBuffer[6] = {};

for (size_t index = 0; index < 2; i++) {
    if (!FilterError(i2c_master_start(cmd))) return false;
    if (!FilterError(i2c_master_write_byte(cmd, (uint8)((devAddr << 1) | I2C_MASTER_WRITE), true))) return false;
    // Set the register we are writing to
    if (!FilterError(i2c_master_write(cmd, (uint8*)&regAddr[index], 1, true))) return false;
    // Repeated start
    if (!FilterError(i2c_master_start(cmd))) return false;
    if (!FilterError(i2c_master_write_byte(cmd, (uint8)((devAddr << 1) | I2C_MASTER_READ), true))) return false;
    if (!FilterError(i2c_master_read(cmd, &dataBuffer[i * 3], 3, I2C_MASTER_LAST_NACK))) return false;
    // Tried with and without the following line (repeated start or not), but it doesn't work
    if (!FilterError(i2c_master_stop(cmd))) return false;
 }

// Later
if (!FilterError(i2c_master_cmd_begin(port, cmd, 1000 / portTICK_PERIOD_MS))) return false;

What is the actual behavior?

If there's only one read, the above code works. But if there are multiple read (like above) it fails with timeout, even with the stop part.

If I create 2 cmd link, it works. But I want to create a complete transaction with 2 read and (re)execute it whenever required. What's the purpose of I2C_LINK_RECOMMENDED_SIZE if it only support one transaction only ?

As a side note, the documentation state:

For example, if one wants to perform a read on an I2C device register, TRANSACTIONS must be at least 2, because the commands required are the following:
    write device register
    read register content

but the actual code does this in i2c_master_write_read_device does this:

{
    esp_err_t err = ESP_OK;
    uint8_t buffer[I2C_TRANS_BUF_MINIMUM_SIZE] = { 0 };

(That's I2C_LINK_RECOMMENDED_SIZE(1) and not 2)

Steps to reproduce.

Please copy the above (simple) code and connect on any I2C device where you can read 2 register with a 8 bit address.

Debug Logs.

Error is: !ESP_ERR_TIMEOUT(263)

More Information.

No response

X-Ryl669 commented 2 years ago

Few progress here:

I have similar issue without using a static buffer (typically, by using i2c_cmd_link_create) so it's not related to the way the command link handle is created. Also, by merging the first 2 writes in a single write, that is, by doing:

uint8 skipWrite[2] = {(uint8)((devAddr << 1) | I2C_MASTER_WRITE), 0xFA };
if (!FilterError(i2c_master_write(cmd, skipWrite, 2, true))) return false;

Then no timeout error is reported, but the second read returns 0xFF (that's worse, IMHO, since it feels like it worked but returned garbish).

Also, by reading the I2C code, there's a race condition going on here in i2c_master_cmd_begin_static:

            i2c_hal_write_txfifo(&(i2c_context[i2c_num].hal), write_pr, fifo_fill);
            i2c_hal_write_cmd_reg(&(i2c_context[i2c_num].hal), hw_cmd, p_i2c->cmd_idx);
            i2c_hal_write_cmd_reg(&(i2c_context[i2c_num].hal), hw_end_cmd, p_i2c->cmd_idx + 1);
            i2c_hal_enable_master_tx_it(&(i2c_context[i2c_num].hal));
            p_i2c->cmd_idx = 0;
            if (i2c_cmd_is_single_byte(cmd) || cmd->total_bytes == cmd->bytes_used) {
                p_i2c->cmd_link.head = p_i2c->cmd_link.head->next;
                if(p_i2c->cmd_link.head) {
                    p_i2c->cmd_link.head->cmd.bytes_used = 0;
                }
            }
            p_i2c->status = I2C_STATUS_WRITE;

Here you enable the hardware interrupt (line 1421), but you later modify the p_i2c->cmd_link.head. So if the CPU reach line 1421 (enable_master_tx_it), and the interrupt fires, it'll call the interrupt handler, that'll, in turn, call i2c_master_cmd_begin_static (so you're re-entering this function before the effect on the linked list is applied and it's wrong).

Typical example, for this corruption:

You have a sequence of "read / write / read" command and are actually processing the write command.
The cpu process to line 1424 (cmd_link.head = head->next) but not bytes_used = 0 and also not the status = WRITE line
I2C interrupt triggers (cpu enters i2c_isr_handler_default)
Write event is processed, it calls back i2c_master_cmd_begin_static
Because the status wasn't updated yet, the line 1354 that read
```
if (p_i2c->cmd_link.head != NULL && p_i2c->status == I2C_STATUS_READ) {
```
pass the test (the previous operation was a read, and the link head isn't NULL). It shouldn't pass this test, since we are executing a write not a read.
Then, the code tries to copy the I2C's read buffer while it was executing a write command

I didn't follow what happens next, but it explains why I'm seeing garbage in the read buffer. The same problem happens for the read operation but the race window is shorter, since the code is doing the status = READ part just after enabling the interrupt.

X-Ryl669 commented 2 years ago

I've tried to bitbang the I2C command flow above and it works, with or without the stop condition (in a repeated start condition) between both read. So it's definitively an issue with this HW implementation not the device itself.

Then I've added logs in the driver and it times out in the command that's created for the 3rd write_byte. The interrupt handler never receive an event for this write's command, so the timeout is actually a real timeout. This could happen if the device does ACK the write, but in that case, I guess we would get a HW's event with TOUT right ?

I'm trying to capture the output of the actual bus traffic with a logic analyzer, so I can see what's happening and if the device is ACKing the third write or not.

X-Ryl669 commented 2 years ago

Ok, so here's the actual result on the console:

E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 2/2/2, cmd idx: 1
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 1/0/1, cmd idx: 1
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C prog read: 2/0/2, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 2/0/2, cmd idx: 0
E i2c: I2C prog read: 1/0/1, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 1/0/1, cmd idx: 0
E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 2/2/2, cmd idx: 1
!ESP_ERR_TIMEOUT(263) in main.cpp:86

I've added logging with ESP_DRAM_LOGE in multiple places (typically, in the generic command prog, write command prog write, read cmd prog read, interrupt handler isr, read callback read int cb in cmd_begin_static).

So it means that what I expected above doesn't happen, yet the second write doesn't trigger the ISR and thus a timeout is received.

On the bus, with a logical analyzer, I'm getting this:

The first 4 communications are correct (a write of the address to read, then 2 reads with a ACK for the first one and a NACK for the last one which doesn't happen), then it goes crazy, it sends a read and never finish (while it should send a write transaction instead). It repeats reading undefinitively (well, until the end of the capture, like 2 thousand reads).

If you look carefully, it seems the last NACK for the read command only happens inside for the "start" command, maybe it's expected?, but then its waaaayyyy later. I would have expected SCL to go high after the last bit in the read command, but it doesn't:

X-Ryl669 commented 2 years ago

I've tried with a stop before the 2nd read and while the sequence seems correct on the terminal (the values read are wrong for the 2nd read), a physical capture show that the HW is still struck/crashed:

E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 2/2/2, cmd idx: 1
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 1/0/1, cmd idx: 1
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C prog read: 2/0/2, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 2/0/2, cmd idx: 0
E i2c: I2C prog read: 1/0/1, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 1/0/1, cmd idx: 0
E i2c: I2C prog 3: cmd idx: 0 // This is stop
E i2c: I2C prog 0: cmd idx: 1 // This is start for 2nd read request
E i2c: I2C prog write: 2/2/2, cmd idx: 2
E i2c: I2C isr event type 5: cmd idx: 0
E i2c: I2C prog 0: cmd idx: 0
E i2c: I2C prog write: 1/0/1, cmd idx: 1
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C prog read: 2/0/2, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 2/0/2, cmd idx: 0
E i2c: I2C prog read: 1/0/1, cmd idx: 0
E i2c: I2C isr event type 4: cmd idx: 0
E i2c: I2C read int cb: 1/0/1, cmd idx: 0
E i2c: I2C prog 3: cmd idx: 0 // This is stop
E i2c: I2C isr event type 5: cmd idx: 0
Trans: 80 00 00 FF FF FF

The capture is the same as the above except for the last stuff (last image in my previous message): Here, there isn't any write of the register address, and no read whatsoever. In fact, it's even worse, since nothing's said it has failed.

X-Ryl669 commented 2 years ago

Just a remark in the logic in cmd_begin, when you parse a read request, you're making a read command and you append a end command (in cmd index 1). However, when the read complete (cmd index 0), the ISR is invoked, and you resume in cmd_begin_static, where you reset the cmd index to 0, so the end command is never executed. That's probably why we see the ACK / NACK happening in the later command and not at end of the ISR's context. Maybe you should execute the end command, then and only then, resetting cmd index to 0 and then continue appending the later commands.

Also, there's a huge temporal distance/delay between commands, so it's very far from optimal (since every step requires a ISR processing and a FreeRTOS queueing/scheduling). It would be better to enqueue as much commands as possible since you have 14 slots available if I understand the TRM correctly), to avoid these delay and performance issues.

o-marshmallow commented 2 years ago

Hi @X-Ryl669 ,

Thanks for providing that many details! I tested on my side and I see the same issue, few things here to note however:

I2C commands where not meant to represent multiple successive transactions. This is why we now have must simpler functions (i2c_master_write_to_device, i2c_master_read_from_device, i2c_master_write_read_device) that should reply to almost all the common use-case of I2C devices. Is there any particular reason that let you chain two transactions in the same command instead of making two transactions?
You are right about the fact that the way the software commands are written to the hardware could be highly optimized (because the hardware buffer has 16 entries). This is something that should be done in the NG (new-generation) driver implementation.
The race condition you talked about makes the assumption that as soon as you write to the tx FIFO and enable TX interrupt, an interrupt could occur. But in practice that is not the case. The transaction won't start until the function i2c_ll_trans_start is called. This is done before leaving the function, outside the loop where the previous function calls happened, and thus, after modifying any command link head or field.
The I2C hardware IP doesn't seem to support a stop following a restart. However, there is a way to force it to put a pause in between to prevent the error you are seeing. I have just created a small patch to overcome this issue: i2c_patch.txt (The file is a .diff but Github won't let me upload files with such extension)

Overall, the best solution would be to split up your code to actually perform two transactions thanks to i2c_master_write_read_device. If you still want to chain two transactions within one I2C command link, you can apply the patch above and have a try.

X-Ryl669 commented 2 years ago

Is there any particular reason that let you chain two transactions in the same command instead of making two transactions?

I was seeing very sub-optimal I2C communication time, where a read of 6 bytes (at 2 different address) took 1ms where it should have taken 270µs (108 bits@400kHz). I thought it was due to task scheduling (and it's probably the case), so instead of executing multiple transactions that were allocating buffer and freeing them, I thought that executing a single static transaction should be a lot faster. And since the new release of esp-idf introduced the I2C_LINK_RECOMMENDED_SIZE macro, I thought it was the way to go.

Then I've looked into the I2C code and I've understood that even a single read transaction is in fact split in multiple sub-transaction, each with FreeRTOS's scheduling latencies issue in them. I don't know why the driver is so dependent on FreeRTOS here. Honestly I don't understand why the ISR had to post to a queue while executing a single command of a larger transaction, since this creates huge delays between commands.

This is something that should be done in the NG (new-generation) driver implementation.

Can I test it ? Is it available in beta somewhere ?

The race condition you talked about makes the assumption that as soon as you write to the tx FIFO and enable TX interrupt, an interrupt could occur. But in practice that is not the case. The transaction won't start until the function i2c_ll_trans_start is called. This is done before leaving the function, outside the loop where the previous function calls happened, and thus, after modifying any command link head or field.

You're right. I've figured this out later when I read again the TRM. I've still a question how to understand this line:

**Note**: When there are more than three segments, the address of an END command in the cmd should not be
altered into another command by the next segment.

If you know what it mean.

The I2C hardware IP doesn't seem to support a stop following a restart. However, there is a way to force it to put a pause in between to prevent the error you are seeing.

I'll test your patch and come back to you as soon as possible. Please notice that it works if I bitbang the I2C bus without a stop (the sensor I'm talking to, allows to have a repeated start after a NACK read to set up a new read pointer address (a "write" I2C command)). But it doesn't work with the hardware either and there's no STOP / START in that case.

Overall, the best solution would be to split up your code to actually perform two transactions

I was making 2 transactions before, but it's so slow that it's useless for my application (I have a very busy I2C bus with one current sensor that I must read at very regular interval of ~250µs to accumulate the power consumption like a coulomb sensor). Here I can't afford a 4x longer I2C transaction time for the other sensors.

In the HW captures, the I2C bus is most of the time idle anyway so I wonder what I'm capturing between I2C transactions is FreeRTOS's overlay.

X-Ryl669 commented 2 years ago

I've tested your patch and it's working to prevent the HW from crashing the bus so I guess it's a good workaround. But it's not faster than splitting transactions, so I'm back to point 0 now, how to use the HW implementation at full speed?

Also, it's obviously not working in the Start/Write/Start/Read/Start/Write/Start/Read/Stop case, and I still don't understand why. I guess it's due to the ACK bit of a read appearing in the next command (not the current read command) so the next Start fails because the master is likely holding the SDA line high for the NACK ?

I'm wondering why the current driver is architectured like this? Why doesn't it convert the command to fill the HW's command fifo and tries to execute all of them in one shot, instead of splitting each command by its own small subcommand ?

o-marshmallow commented 2 years ago

Hi @X-Ryl669 ,

As you saw, I2C_LINK_RECOMMENDED_SIZE and these related static functions won't influence the transaction time-to-finish once started. It is making the creation of the i2c link faster ad they won't perform any alloc.

I guess the driver was made like this (one operation at a time) because it was simpler to implement, less edge cases. Imagine we have two WRITEs and the first one has more bytes than the hardware FIFO, then we would still need to break the transaction down into two.

Anyways, I have a question regarding your requirements, you said you have to read the sensor at a regular interval of 270µs, but you also said that you have to transfer 6 bytes? Like every time?

If that's the case, it means that in theory each transaction will take about 180µs, if we add the software overhead, you will spend more than 200µs easily for a single transaction.

Moreover, I am convinced you have other things to process in other tasks so the task that performs the I2C is not the only task, FreeRTOS needs to schedule the other tasks to let them run. So the delay you will see between transaction is very long. What delay are you seeing between each transaction?

On a very small example that has only a single app task that performs:

/* 4 bytes in total */
while (1) {
    i2c_master_write_read_device(I2C_MASTER_NUM, 0x20, wr, 2, rd, 2, 100);
}

on an ESP32, with a CPU frequency of 160MHz and an I2C bus at 400KHz with 2KOhm external pull-up resistors, I get 50µs between each transaction.

One way to optimize this a bit is to prepare a static command link, like you did, with i2c_cmd_link_create_static and i2c_master_stop, etc, that will be re-use across all the transactions. you don't need to free and re-create one every time you want to perform a transaction. For example, you can imagine having:

    i2c_cmd_handle_t cmd = i2c_cmd_link_create_static(buffer, I2C_LINK_RECOMMENDED_SIZE(2));
    uint8 devAddr = 0x76;
    uint8 write_buffer[3] = { devAddr, 0 };

    i2c_master_start(cmd);
    i2c_master_write(cmd, write_buffer, 3, true);
    i2c_master_stop(cmd);

    /* Later, we can alter write_buffer, to generate a new operation */
    for (int i = 0; i < 10; i++) {
        write_buffer[1] = i;
        write_buffer[2] = i + 1;
        i2c_master_cmd_begin(port, cmd, 1000 / portTICK_PERIOD_MS);
    }

That's perfectly valid, maybe this is already what youa re doing.

With the same hardware setup as I mentioned above, by using this method of re-using a cmd link, I drop from 50µs to 40µs between each transaction.

Finally, if that's not fast enough, you can create a FreeRTOS task dedicated to that, pinned to the second core, with a high priority (higher than your other tasks).

X-Ryl669 commented 2 years ago

Anyways, I have a question regarding your requirements, you said you have to read the sensor at a regular interval of 270µs, but you also said that you have to transfer 6 bytes? Like every time?

Yes, and to be honest it's a bit more complex than that on my system, I have 3 INA3221 sensor + 1 BMP280 sensor + 1 ICM20948 sensor on the same bus. Only one of the current sensor needs to be read at this interval, so I've a I2C scheduling task with a high priority, and I'm doing a ping/pong reading "battery current sensor / other sensor (one of env. sensor, or attitude sensor or other power sensor) / current sensor / other sensor / etc...). So, yes, I've prepared static transactions and I expected to be able to schedule them with the lowest overhead possible.

If that's the case, it means that in theory each transaction will take about 180µs, if we add the software overhead, you will spend more than 200µs easily for a single transaction.

I'm observing 1ms for a theoretical 270µs transaction with long pauses between the internal I2C subcommands (like between write and read transactions, between read and stop transaction etc...). It's clearly linked with the isr's handler time and looking at the code, I guess it's due to the alive event being queued for each sub-command transaction (I guess its used to deal with software timeout from the caller).

If the HW queue could be maximized (with the minimum ISR time used to refill/empty the queue and FIFO), I think it should be possible to avoid this overhead (so instead of having a END interrupt after each subcommand, maybe it's worth only dealing with the transaction complete interrupt (or, only have a END interrupt after a READ and before a WRITE, since the READ command overwrite the WRITE's RAM buffer). TLDR: Reducing the interrupt overhead and trying to remove/reduce the FreeRTOS's overhead too.

What delay are you seeing between each transaction?

My ESP32 isn't too much loaded, since doing the I2C processing is 80% of the tasks it must perform. The other 2 task are sending UART communication (which is HW based too, validated and it works), and some SPI code for capturing a weight sensor sampling (HW based too, validated and it works). There is exceptional relay control via GPIO that doesn't count in the time budget. So, yes the I2C communication is central for the system to work. I'm currently seeing around ~100µs to 200µs between subcommands (on average). I'm currently cleaning my code to remove all the traces I've added to be able to measure this more precisely. Also, even if I set the I2C speed to X Hz, its clock is going slower (for example, I measure 88kHz for a 100kHz target, ~350kHz for a 400kHz target). I don't know if it's expected or not.

That's perfectly valid, maybe this is already what you are doing.

With the same hardware setup as I mentioned above, by using this method of re-using a cmd link, I drop from 50µs to 40µs between each transaction

Yes that's what I was doing. But I wasn't seeing these delays (those are completely acceptable for my application). I'll try again now that the transaction works and report here with captures. I'm not seeing such low latency as soon as FreeRTOS is involved either, so maybe I haven't configured FreeRTOS the right way ? Also I'm running the ESP32 at 240MHz.

When you said a new generation driver is coming, is it already written or just planned?

o-marshmallow commented 2 years ago

Hi @X-Ryl669 ,

Sorry, I forgot to reply to that question: the new generation driver is planned, not written yet.

Having an effective I2C clock speed slower than the theoretical one is expected, but the gap for you is not negligeable. Are you using the internal pull-up resistors? If yes, you should definitely add external pull up resistors to both SCL and SDA lines. Try with 4.7Kohms resistors first and if you still see that the effective SCL is far from 400KHz, use 2Kohm resistors instead.

Is the hardware still modifiable? If yes, you could install the high priority sensor on one I2C port and the other non-critical sensors on the I2C port, since the ESP32 has two hardware I2C port.

1ms is indeed very long for such transaction. What you can do to test whether this is due to the ISR handling is to pin your I2C task (the high priority one) on the core 1 and keep the other tasks, UART- and SPI-related, on core 0. As such, the I2C interrupt handler will always be called from the core 1, check the function xTaskCreatePinnedToCore

@Dazza0 any inputs related to FreeRTOS that could cause such delays?

X-Ryl669 commented 2 years ago

I'll plug an oscilloscope to look at the slew rate and shape of the I2C bits. There are external pull up on the bus, on each sensor (they are all optional but one) so there should be an N parallel pull-up resistor (I'll measure the actual value and make some test). You make me think about this, thanks, I'm pretty sure I've enabled pull up on the GPIO configuration for the I2C line, I'll disable them, should be useless anyway.

I'm out of pins on the ESP32 module we are using, I can't find 2 other free pins, they are all used for the sensors and the relays control and the signal leds. That's the main default of the ESP32, there isn't enough pins ;-)

I don't want to use an I2C IO extender.

o-marshmallow commented 2 years ago

Using several pull-up in parallel is not necessary, it will make the total pull-up resistor value of the bus lower.

Using an I2C I/O expander will make things worse as you will also need to program it using I2C the bus, so it means more bytes to send every time.

Do you mean that you have used up all the 26 available pins on the ESP32 module? Are the connected modules also time critical? If not, you could imagine reducing a amount of pins used by adding chips to manage the LEDs or other components? For example, if you have 8 LEDs, instead of having 1 GPIO per LED, you could imagine adding a serial-in/parallel-out shift register, such as the 74HC595, to your circuit that will let you control 8 LEDs thanks to 2 GPIOs: clock and serial data in. (You could even re-use the SPI bus to control the register, in that case it would take you only 1 GPIO as you already have set up an SPI in your circuit)

Anyways, what you can try at the moment is "reserving" one core for your high priority I2C as I said before :smile:

X-Ryl669 commented 2 years ago

Ok, it took too long because I had to port all platformio stuff to work with the master branch (BTW, espressif hasn't published the 11.2 version of the toolchain to platformio's registry).

So I'm measuring the time it takes to complete both transactions (not including the time to allocate/delete the command link), removed all logs and here are my results:

With this patch, i2C running at 400kHz, no other task than app_main, both read transaction in a single command link: 562µs
With this patch, i2C running at 400kHz, no other task than app_main, each transactions in its own command link: 608µs
Disabling "pullup enable" option: no change in timing

o-marshmallow commented 2 years ago

Hi @X-Ryl669 ,

562µs is more than expected, do you have a screenshot of the logic analyzer capture you have made? What is the effective SCL frequency you can see?

Can you please check the resistance value between SCL-3.3V and SDA-3.3V?

X-Ryl669 commented 2 years ago

So here are the captures: Sampling of the analyzer is 2MHz, so a sample duration is 0.5µs.

I2C Bus was set to 200KHz (else, it failed with timeout when I plugged the logic analyzer, probably too high capacitance on the line) The whole capture:

The whole duration is 1457 samples, which is 728µs (reported as 890µs in the software, I guess due to notifying the main task that the transfer is done) There are very few pauses in the clock line (the largest is between the stop and start: 33 samples / 16.5µs see below)

Between ACK and repeated start condition:

6 samples = 3µs

Between stop and restart condition:

20 samples : 10µs

I guess there isn't much to gain here, it seems that the I2C bus is quite busy with no dead time.

When I set the bus to 400kHz, I'm not getting half the time here (it would be good enough) but 562µs instead of ~450µs. I'll try to fix why I can't capture it with my analyzer.

As for the line's pull up resistance:

If I only plug the BMP280 sensor, I read 10kOhm between SDA and VCC (same with SCL).
If I plug all the sensors, I read 2.5kOhm on those lines.

o-marshmallow commented 2 years ago

Thanks @X-Ryl669 for the details here!

Honestly, the timings I see in your capture are correct for a 200KHz I2C Bus. The timing between a stop and a (re)start is expected.

However, what you said about the pull-up resistors confirm what I was thinking. You have 10kOhm pull-up resistors on each sensors (as you said), and as you have 5 sensors, the total resistance between SDA/SCL and 3.3V will be 2kOhm in theory (1/R = 1/R₁ + 1/R₂ + ... + 1/R_N, in your case, R = 1/(5/10000) = 2000)

This should be enough to reach 400KHz, but in that case you need ALL the sensors to be plugged, else you won't be able to reach this speed.

I would advise you to try connecting your SDA and SCL lines to 3.3V with a 2kOhm (or 1kOhm) on the main board, so that they doesn't rely on the connected sensors' resistors, and retry your test.

For info, on ESP modules, with a 10kOhm pull-up resistor, you can only reach about 340/350KHz. (Of course the circuit capacitance is also an important factor but this is why I think you should try both 1kOhm and 2kOhm)

X-Ryl669 commented 2 years ago

Sure, you're right. While doing my tests, I've removed all other sensors since initially the SW reported timeout errors. Since the pull up resistors are on the sensor boards, that's why I only have 10kOhm on the line. I need to tinker with the harness to plug the logic analyzer. I'll solder wires on the I2C line so I can make proper captures with the logic analyzer.

I'll do like you recommended and try to free 2 pins for using the additional I2C bus by removing 2 leds.

As a side note, I've found other cases of the system failing depending on how to set up the transaction (with your patch applied):

If you append Start / Write(dev address + W) / Write (reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop, typical read transaction, it works
If you append Start / Write(dev address + W) / Write (reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop / Start / Write(dev address + W) / Write (reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop, 2 successive read transactions, it fails
If you append Start / Write 2 bytes (dev address + W, reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop / Start / Write(dev address + W, reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop, 2 successive read transaction, but with a smaller command length, it works
If you append Start / Write 2 bytes (dev address + W, reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Rep Start / Write(dev address + W, reg address) / Rep Start / Write(dev addr + R) / Read N-1 with ACK / Read 1 NACK / Stop, 2 successive read transactions with a repeated start as specified in I2C standard, it fails

So it seems to depend on the transaction operations' length, or since the driver is splitting the transaction is a small subset of command with END command in-between, it should work. Yet, it doesn't. I guess it's due to the position of the end command in the HW command's buffer, and I've spot a note in the TRM, page 285, that seems linked to this: When there are more than three segments, the address of an END command in the cmd should not be altered into another command by the next segment.. I don't understand what it means.

X-Ryl669 commented 2 years ago

If I understand the driver code correctly, any write operation is converted to a "write + end" command. Same for the read operation that's converted to "read + end" command. However, if you follow the read callback code in i2c_master_cmd_begin_static, around line 1360, if the read operation hasn't finished reading all bytes, the command index is set to 0 (as expected). But else, it isn't reset to 0, so the next "command parsing/splitting" will happen after the executed read+end operation, so the HW command buffer will be like this:

Before interrupt:
 [ Read 1 NACK | End ]  
/\ HW pointer

In Interrupt:
 [ Read 1 NACK | End ]  
                /\ HW pointer
At line 1441 (after executing the code around 1360, total_bytes == bytes_used):
 [ Read 1 NACK | End | Stop ]  
                /\ HW pointer

It works if the read is the last operation in the transaction because at line 1457 `if (p_i2c->cmd_link.head == NULL ... => cmd_idx = 0`

However, as soon as there are additional operations after the read operation (for example a rep start) line 1457 isn't executed and the cmd index keeps incrementing while it's already processed until it's set back to 0 on the next write, like this:
 [ Read 1 NACK | End | Start | Write | End ]  
/\ HW pointer

That might explain why I'm seeing infinite read in the first capture, and also why your patch works with a stop&start successive operations, since you're triggering a new END interrupt that's acting like a write, that is, bypassing the cmd_index non-reset. I think the cmd_idx should be reset to 0 after a read operation also when total_bytes == bytes_used so that progress can be made in the HW command buffer.

o-marshmallow commented 2 years ago

Hi @X-Ryl669,

Thank you for all these details and the tests you have performed!

I can confirm what you have found in the previous message after finally understanding the sentence When there are more than three segments, the address of an END command in the cmd should not be altered into another command by the next segment.

The TRM specifies that a segment is a transaction starting from command index 0 and ending with an END (or STOP, but not important here). When we have more than 3 segments, the 4th+ segments must not modify the command operation that contains an END operation. Which means that if during a segment, the END command is the second one (command index 1), in the next one, END must NOT be altered, so the command index 1 should still be an END.

I will take back your example 2. from your previous message. Let's perform 2 successive read transactions. The driver will first perform: 1 - START + WRITE + END (Segment 1) 2 - WRITE + END (Segment 2) 3 - RSTART + WRITE + END (Segment 3) 4 - READ + END (Segment 4) [...] Starting from the fourth segment, the second command (index 1) shall always be an END, until the transaciton is finished with a STOP.

Moreover, the hardware doesn't support a succession of STOP/START commands because it only supports a single transaction within the commands table. In fact, as soon as the STOP command is encountered, the HW internal command index is reset to 0. So, the START command that was located after the STOP won't be executed.

Thinking about the patch back, inserting the END command may not be necessary, but it is still necessary to tell the software that after the end of the current transfer, there is another one waiting (managed by the added I2C_STATUS_BREAK status)

X-Ryl669 commented 2 years ago

Thanks for the explanation. Unfortunately, it raises more question than it solves. I don't understand why the HW is so complex for following I2C specifications...

Anyway:

When the HW finds a END command, and the END_INT is enabled, it triggers an interrupt. But it also reset its HW's command pointer to zero, right ? (I thought that the HW was checking the "command done" bit for each command and skipping those that were done). Else, that doesn't match what we are observing, that is: the command 0 is overwritten with new work to do and work is resumed from here, it seems.
How can the HW knows it's executing the 3rd transaction (or the 4th...) ? Since each previous transaction starts with HW's command at position 0 ? Is there a counter we can reset somehow ?
Can we store a END at the 16th command item, and fill the empty slots with 0 sized byte length's WRITE ? I've tried to add one in the command list and it seems to be ignored, as expected. The idea would be to always submit 16 commands to the HW that will all execute down to the 16-th command (which is an END) and don't care about the iteration index.

I might be wrong, but as I've understood, the HW stop condition doesn't need to be in the command buffer for the stop condition interruption to be triggered. In fact, I've done another project in the past with multiple master on the bus, and the ESP32 HW was generating the interrupt when another master generated the stop condition (not a big deal per se). So, it seems the logic to generate interrupts is based on observing the line, not on the command being executed (except for END, right ?), whatever the hardware which is manipulating the lines.

I've tried to set the cmd_index to 0 in the READ finished callback, but it doesn't work either for solving the failing cases.