espressif / arduino-esp32

Arduino core for the ESP32
GNU Lesser General Public License v2.1
13.43k stars 7.37k forks source link

Robustify i2cProcQueue function #1509

Closed LaurentLouf closed 6 years ago

LaurentLouf commented 6 years ago

Hardware:

Board: ESP32 WROOM32 Core Installation/update date: last IDE name: ESP-IDF Flash Frequency: 140Mhz ? Upload Speed: 115200

Description:

I encounter random crashes using a DS3231 with I2C. While this may be on my end since I'm using it inside a BLE function, the error is pretty clear : IntegerDivideByZero with this line concerned https://github.com/espressif/arduino-esp32/blob/2fbe4908facddc91e299f56aa4afd0573b20dabb/cores/esp32/esp32-hal-i2c.c#L956 . portTICK_PERIOD_MS should'nt be equal to zero, so my best guess is that i2cGetFrequency returns zero (which is possible, considering the code of the function), which isn't checked and so creates a possible error.

Debug Messages:

Not sure that helps but anyway :

[E][esp32-hal-i2c.c:988] i2cProcQueue(): I2C exitCode=0x112
Guru Meditation Error: Core  0 panic'ed (IntegerDivideByZero)
. Exception was unhandled.
Core 0 register dump:
PC      : 0x401226fc  PS      : 0x00060f30  A0      : 0x8012291c  A1      : 0x3ffee2e0  
0x401226fc: i2cProcQueue at /home/laurent/Projects/laurent-esp32/components/arduino/cores/esp32/esp32-hal-i2c.c:956

A2      : 0x3ffc1498  A3      : 0x00000000  A4      : 0x00000032  A5      : 0x00000000  
A6      : 0x000007ff  A7      : 0x00004e20  A8      : 0x801226fc  A9      : 0x3ffee2c0  
A10     : 0x00000000  A11     : 0x0000060e  A12     : 0x3fff9008  A13     : 0x3fff5e18  
A14     : 0x0000000a  A15     : 0x3ffc1498  SAR     : 0x00000016  EXCCAUSE: 0x00000006  
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000  

Backtrace: 0x401226fc:0x3ffee2e0 0x40122919:0x3ffee320 0x401208d5:0x3ffee340 0x40120929:0x3ffee360 0x40120a59:0x3ffee380 0x40096b52:0x3ffee3a0 0x400edf26:0x3ffee3e0 0x4011c3fa:0x3ffee590 0x4011d751:0x3ffee870 0x4011bdbd:0x3ffee890 0x40122d21:0x3ffee910 0x4011cd86:0x3ffee930 0x4011d1ed:0x3ffee970 0x4012e20a:0x3ffee990 0x40129866:0x3ffee9d0
0x401226fc: i2cProcQueue at /home/laurent/Projects/laurent-esp32/components/arduino/cores/esp32/esp32-hal-i2c.c:956

0x40122919: i2cWrite at /home/laurent/Projects/laurent-esp32/components/arduino/cores/esp32/esp32-hal-i2c.c:1303

0x401208d5: TwoWire::writeTransmission(unsigned short, unsigned char*, unsigned short, bool) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/Wire/src/Wire.cpp:312

0x40120929: TwoWire::endTransmission(bool) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/Wire/src/Wire.cpp:312

0x40120a59: TwoWire::endTransmission() at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/Wire/src/Wire.cpp:312

0x40096b52: RtcDS3231<TwoWire>::GetDateTime() at /home/laurent/Projects/laurent-esp32/components/Rtc/src/RtcDS3231.h:296
 (inlined by) RealTimeClock::now() at /home/laurent/Projects/laurent-esp32/main/RealTimeClock/RealTimeClock.cpp:70

0x400edf26: NightSettings::set_alarm_time(char*) at /home/laurent/Projects/laurent-esp32/main/State/laurentState.h:16
 (inlined by) NightSettings::onWrite(BLECharacteristic*) at /home/laurent/Projects/laurent-esp32/main/BLEServer/NightSettingsService.h:61

0x4011c3fa: BLECharacteristic::handleGATTServerEvent(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLECharacteristic.cpp:720

0x4011d751: BLECharacteristicMap::handleGATTServerEvent(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLECharacteristicMap.cpp:94

0x4011bdbd: BLEService::handleGATTServerEvent(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLEService.cpp:319

0x40122d21: BLEServiceMap::handleGATTServerEvent(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLEServiceMap.cpp:92

0x4011cd86: BLEServer::handleGATTServerEvent(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLEServer.cpp:179

0x4011d1ed: BLEDevice::gattServerEventHandler(esp_gatts_cb_event_t, unsigned char, esp_ble_gatts_cb_param_t*) at /home/laurent/Projects/laurent-esp32/components/arduino/libraries/BLE/src/BLEDevice.cpp:528 (discriminator 1)

0x4012e20a: btc_gatts_cb_to_app at /home/laurent/Projects/laurent-esp32/components/esp-idf/components/bt/bluedroid/btc/profile/std/gatt/btc_gatts.c:54
 (inlined by) btc_gatts_cb_handler at /home/laurent/Projects/laurent-esp32/components/esp-idf/components/bt/bluedroid/btc/profile/std/gatt/btc_gatts.c:774

0x40129866: btc_task at /home/laurent/Projects/laurent-esp32/components/esp-idf/components/bt/bluedroid/btc/core/btc_task.c:110
stickbreaker commented 6 years ago

@LaurentLouf

[E][esp32-hal-i2c.c:988] i2cProcQueue(): I2C exitCode=0x112

This error message show an arbitration failure. The esp32 was using the i2c bus and another master collided with it. Do you have a multi-master configuration?

uint32_t i2cGetFrequency(i2c_t* i2c) will only return 0 under two conditions:

uint32_t i2cGetFrequency(i2c_t * i2c)
{
    if(i2c == NULL) {
        return 0;
    }
    uint32_t result = 0;
    uint32_t old_count = (i2c->dev->scl_low_period.period+i2c->dev->scl_high_period.period);
    if(old_count>0) {
        result = APB_CLK_FREQ / old_count;
    } else {
        result = 0;
    }
    return result;
}

Either of these conditions are abnormal, never should be encountered during normal execution. Normally scl_low_period and scl_high_period are both 400, and APB_CLK_FREQ is 80,000,000 so the result is 100kHz(default i2c bus rate).

I don't have any idea how i2c could be null, the call to i2cWrite() would fail with ERROR_DEV is i2c was null before it ever called i2cProcQueue()

So, the only way I can see this divide by zero occurring is that if APB_CLK_FREQ is less than scl_low_period + scl+high_period or scl_low_period and scl_high_period are both zero?

Chuck.

LaurentLouf commented 6 years ago

This error message show an arbitration failure. The esp32 was using the i2c bus and another master collided with it. Do you have a multi-master configuration?

That may be it. I recently inherited this code, so there are a few things that I still don't fully get about how it functions. This problems occur when connecting via Bluetooth and sending some data, so if I understand correctly, this happens on core 0 where the bluetooth code executes. On the other core, it's the application code, which uses i2c, so could it be a conflict between the two ? Is this doable/advisable to try to handle i2c communication on two different cores ? Should I configure i2c on the second core or just re-design the code to avoid such configurations ?

And a big thanks for your investigation of the problem !

stickbreaker commented 6 years ago

@LaurentLouf describe you i2c circuit a multi-master system is abnormal. I haven't seen anyone actual use a esp32 in a multi-master environment.

You can add additional debug to i2cGetFrequency() to verify the actual cause:

uint32_t i2cGetFrequency(i2c_t * i2c)
{
    if(i2c == NULL) {
        log_e( "i2c==NULL" );
        return 0;
    }
    uint32_t result = 0;
    uint32_t old_count = (i2c->dev->scl_low_period.period+i2c->dev->scl_high_period.period);
    if(old_count>0) {
        result = APB_CLK_FREQ / old_count;
      } else {
        log_e( "old_count==%ld",old_count );
        result = 0;
    }
    if( result==0 ){
      log_e("fail freq=0, APB_CLK_FREQ=%ld, Clock=%ld",APB_CLK_FREQ,old_count);
    return result;
}

I think you have a problem with a memory overwrite issue. Your report describes a event that cannot happen through normal I2C transactions. There is something else happening.

Chuck.

LaurentLouf commented 6 years ago

@stickbreaker Thanks for your help. As expected, removing the i2c logic from the Bluetooth code fixed the problem, which seems to be that i2c devices were controlled by two different threads at the same time. Moving it to only one thread and no concurrent access is in all cases a better practice.

macedolfm commented 5 years ago

Well, my experience sharing a TwoWire among 4 separated tasks is that holding a xSemaphoreTakeRecursive() to the in use Wire (1 or 0) is enough. I did succefullly share the Wir0 & Wire1 with BMP180+BMP280 each plus a OLED display on Wire0 (driven by the lib) on a Heltec ESP32_LoRa board. The only caveat is the SSD1306Wire does not "like" to run inside a task, but is very happy on the main Arduino loop() with the Semaphore on init() & display() (when the I2C is really used), the Adafruit Library on the other hand crashed over & over. So far, so good :-)