espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.27k stars 7.2k forks source link

Watchdog triggered in i2c_master_transmit when I2C slave not switched on (IDFGH-12629) #13627

Open combalafra01 opened 4 months ago

combalafra01 commented 4 months ago

Answers checklist.

IDF version.

5.2.1

Espressif SoC revision.

ESP32-S3

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32-S3-DevKitC-1

Power Supply used.

USB

What is the expected behavior?

receive ESP_ERR_TIMEOUT error code after xfer_timeout_ms milliseconds if no slave is available

What is the actual behavior?

the watchdog triggers after CONFIG_ESP_TASK_WDT_TIMEOUT_S seconds

Steps to reproduce.

create master I2C with a first ESP32 create slave I2C with a second ESP32 connect them switch on master ESP32 but not slave ESP32 when master ESP32 call i2c_master_transmit with xfer_timeout_ms = 1000ms, the watchdog is triggered after CONFIG_ESP_TASK_WDT_TIMEOUT_S seconds while we expect to receive ESP_ERR_TIMEOUT the behaviour is the same when calling i2c_master_probe. The legacy i2c driver was working correctly, this is a problem on the new i2c driver currently if we have an issue with a slave i2c (component KO, wiring issue, ...) , the master is completely blocked by this behaviour

Debug Logs.

E (12965) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (12965) task_wdt:  - IDLE0 (CPU 0)
E (12965) task_wdt: Tasks currently running:
E (12965) task_wdt: CPU 0: main
E (12965) task_wdt: CPU 1: IDLE1
E (12965) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x420287DE:0x3FC9BE00 0x42028BFC:0x3FC9BE20 0x40377B05:0x3FC9BE50 0x42025512:0x3FCA7740 0x420256F0:0x3FCA7770 0x420257B3:0x3FCA77B0 0x420261C4:0x3FCA77E0 0x420155AB:0x3FCA7840 0x42015809:0x3FCA7860 0x4200BBC0:0x3FCA7880 0x4200C00F:0x3FCA78A0 0x4200B3FB:0x3FCA78D0 0x4200B3D3:0x3FCA78F0 0x420ED64B:0x3FCA7910 0x4037E7FD:0x3FCA7940
0x420287de: task_wdt_timeout_handling at D:/dev/espidf/esp/v5.2.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:441 (discriminator 3)
0x42028bfc: task_wdt_isr at D:/dev/espidf/esp/v5.2.1/esp-idf/components/esp_system/task_wdt/task_wdt.c:515
0x40377b05: _xt_lowint1 at D:/dev/espidf/esp/v5.2.1/esp-idf/components/xtensa/xtensa_vectors.S:1240
0x42025512: s_i2c_send_commands at D:/dev/espidf/esp/v5.2.1/esp-idf/components/driver/i2c/i2c_master.c:430
0x420256f0: s_i2c_transaction_start at D:/dev/espidf/esp/v5.2.1/esp-idf/components/driver/i2c/i2c_master.c:509
0x420257b3: s_i2c_synchronous_transaction at D:/dev/espidf/esp/v5.2.1/esp-idf/components/driver/i2c/i2c_master.c:780
0x420261c4: i2c_master_transmit at D:/dev/espidf/esp/v5.2.1/esp-idf/components/driver/i2c/i2c_master.c:966

More Information.

legacy i2c driver i2c_master_cmd_begin was working properly, sending back ESP_ERR_TIMEOUT when there was an i2c issue like this one.

combalafra01 commented 4 months ago

I expect to receive ESP_ERR_TIMEOUT after xfer_timeout_ms instead of having the watchdog triggered after CONFIG_ESP_TASK_WDT_TIMEOUT_S seconds. Of course xfer_timeout_ms < CONFIG_ESP_TASK_WDT_TIMEOUT_S in my case CONFIG_ESP_TASK_WDT_TIMEOUT_S = 10 seconds while xfer_timeout_ms = 1000 milliseconds

mythbuster5 commented 4 months ago

Try it on latest master. I provided a fix there. Backport to 5.2 is on the way. @combalafra01

combalafra01 commented 4 months ago

thank you I cannot test with master, please tell me when 5.2 is available

mythbuster5 commented 4 months ago

OK, then, try it on latest branch release/v5.2? I think some fix already there.. @combalafra01

combalafra01 commented 4 months ago

last merge on i2c_master.c on 5.2 was the one of March 21st (Add parameter to config I2C scl await time) So I guess I will have to wait

acha666 commented 3 months ago

@mythbuster5

At least it's fixed in v5.2.1-528-g70fc0e3037

reinstall idf with branch release/v5.2 helps

combalafra01 commented 3 months ago

I am about to migrate to 5.2.2 do you know if the fix is included in this version ?

combalafra01 commented 2 months ago

it does not fail anymore but I have other issues, I will open a new ticket for that

combalafra01 commented 2 months ago

@mythbuster5 the fact that the master was rebooting is fixed now but there is still an issue: if the slave was off when the master was transmitting, it is ok now, we receive an error but when the slave is back we still have the same error when the master tries to transmit. The master do not see that the slave is back, I don't know why for the moment. If we reboot the master and configure it again it works. One solution would be to probe the slave before transmitting something but in term of performance it adds some load. With the legacy driver that was OK, the master transmit was a success when the slave was back after being down. This is a very common use case.

combalafra01 commented 2 months ago

following new tests it appears that it works if the master is started while the sale is down but not if the master while the slave is up.

use case OK:

  1. slave down, master started =>master do not fail and detect that slave is not there during transmit
  2. slave started => master see that slave is there and transmit works

use case KO:

  1. slave up, master started => slave detected by master and transmit success
  2. slave shutdown => master detect that slave is down, transmit fails
  3. slave started => master do not detect that slave is back, transmit fails
combalafra01 commented 2 months ago

Probing is not a solution because if slave gets down after the probing but before the transmission, the master fails and cannot recover

combalafra01 commented 2 months ago

see #13962 I think you have to release the semaphone in the while loop of s_i2c_send_commands when the xSemaphoreTake fails because it is in a while loop. So the semaphore can be taken OK on first loop, never released for whatever reason (slave not available in our case) fails on second loop because not released, returns and in this case never released.

adding xSemaphoreGive seems to solve the problem.

if (xSemaphoreTake(i2c_master->cmd_semphr, ticks_to_wait) != pdTRUE) {
  // Software timeout, clear the command link and finish this transaction.
  i2c_master->cmd_idx = 0;
  i2c_master->trans_idx = 0;
  atomic_store(&i2c_master->status, I2C_STATUS_TIMEOUT);
  xSemaphoreGive(i2c_master->cmd_semphr);
  return;
}