ShawnHymel / introduction-to-rtos

513 stars 124 forks source link

Part 10 deadlock demo timeout solution: tasks attempt to give/return mutexes that have never been taken #5

Open neilbalch opened 3 years ago

neilbalch commented 3 years ago

I noticed while going through this (excellent BTW!) tutorial series that the ESP32 was resetting repeatedly during the execution of the deadlock demo timeout solution.

Serial output: ``` ESP-ROM:esp32s2-rc4-20191025 Build:Oct 25 2019 rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT) SPIWP:0xee mode:DIO, clock div:2 load:0x3ffe6100,len:0x424 load:0x4004c000,len:0x844 load:0x40050000,len:0x2460 entry 0x4004c180 � ---FreeRTOS Deadlock Demo--- Task A took mutex 1 Task B took mutex 2 Task A timed out waiting for mutex 2 Guru Meditation Error: Core 0 panic'ed (Unhandled debug exception). Debug exception reason: Stack canary watchpoint triggered (Task A) Core 0 register dump: PC : 0x00001004 PS : 0x3ffc75b9 A0 : 0x80028fb7 A1 : 0x8002d7c4 A2 : 0x3ffc7330 A3 : 0x00000054 A4 : 0x00000000 A5 : 0x00000080 A6 : 0x3ffc7500 A7 : 0x00000003 A8 : 0x80028d3d A9 : 0x00000000 A10 : 0x00000000 A11 : 0x00001800 A12 : 0x00000000 A13 : 0x80028d3d A14 : 0x3ffc7350 A15 : 0x00000054 SAR : 0x00000002 EXCCAUSE: 0x00000001 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x3f005c0c LCOUNT : 0x400275ac Backtrace:0x00001001:0x8002d7c4 |<-CORRUPTED ELF file SHA256: 0000000000000000 Rebooting... ESP-ROM:esp32s2-rc4-20191025 Build:Oct 25 2019 rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT) Saved PC:0x400261c0 SPIWP:0xee mode:DIO, clock div:2 load:0x3ffe6100,len:0x424 load:0x4004c000,len:0x844 load:0x40050000,len:0x2460 entry 0x4004c180 Guru Meditation Error: Core 0 panic'ed (Cache error). Core 0 register dump: PC : 0x40099949 PS : 0x00060d34 A0 : 0x800883ae A1 : 0x3fffe2a0 A2 : 0x02000000 A3 : 0x00000000 A4 : 0x02000000 A5 : 0xfffffffc A6 : 0x00000000 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x6000e000 A10 : 0x007dff2d A11 : 0x02000000 A12 : 0x00000030 A13 : 0x00000030 A14 : 0x00231d69 A15 : 0x00000000 SAR : 0x00000007 EXCCAUSE: 0x00000007 EXCVADDR: 0x00000000 LBEG : 0x00000030 LEND : 0x00000030 LCOUNT : 0x40026008 Backtrace:0x40099946:0x3fffe2a00x400883ab:0x3fffe2b0 0x40088d4b:0x3fffe2d0 0x40088f49:0x3fffe2f0 0x40025fd4:0x3fffe320 0x40050bbe:0x3fffe340 0x40050e53:0x3fffe370 0x4004c1c1:0x3fffe4a0 0x40011371:0x3fffe560 0x4000f788:0x3fffe6f0 0x400072a5:0x3fffe710 ELF file SHA256: 0000000000000000 Rebooting... ```

As opposed to what's in this repo, I realized that this version of the function is required to avoid the core reset: (same change for Task B as well)

// Task A (high priority)
void doTaskA(void *parameters) {
  // Loop forever
  while (1) {
    // Take mutex 1
    if (xSemaphoreTake(mutex_1, mutex_timeout) == pdTRUE) {
      // Say we took mutex 1 and wait (to force deadlock)
      Serial.println("Task A took mutex 1");
      vTaskDelay(1 / portTICK_PERIOD_MS);

      // Take mutex 2
      if (xSemaphoreTake(mutex_2, mutex_timeout) == pdTRUE) {
        // Say we took mutex 2
        Serial.println("Task A took mutex 2");

        // Critical section protected by 2 mutexes
        Serial.println("Task A doing some work");
        vTaskDelay(500 / portTICK_PERIOD_MS);

        // Give back mutex
        xSemaphoreGive(mutex_2);
      } else {
        Serial.println("Task A timed out waiting for mutex 2");
      }

      // Give back mutex
      xSemaphoreGive(mutex_1);
    } else {
      Serial.println("Task A timed out waiting for mutex 1");
    }

    // Wait to let the other task execute
    Serial.println("Task A going to sleep");
    vTaskDelay(500 / portTICK_PERIOD_MS);
  }
}

I can submit a PR to fix this if you're interested? Love your work here, Shawn!

ShawnHymel commented 3 years ago

Apologies for the super long delay. I'm just now getting a chance to go through emails and GitHub issues.

Thank you for the fix! I'd like to keep the demos used in the video exactly as they appear in the video (if possible). I don't remember running into the reset issue on my ESP32 in the deadlock timeout demo. Do you have any idea why it was happening for you?

neilbalch commented 3 years ago

Honestly, I can't say I remember as of now 🤦. Looking back at it now, I'm pretty sure the only change I ended up making from your stock solution was to move the mutex_n give instructions inside the if statements.

I don't have the time to try and re-run through my debugging steps right now, but I think this still makes sense from a theoretical perspective. Looking at the serial output and original code, Task A attempts to give mutex_2 immediately after Task A timed out waiting for mutex 2. Unless I misunderstand the usage of a mutex, this should be an invalid (and not desired) operation. (the mutex should only be given back when the task was actually able to take it successfully)

Although, I may very well be missing something important here. For reference, my version of the tutorial is in my repository here: github.com/neilbalch/ESP32-FreeRTOS-Tutorial

ShawnHymel commented 3 years ago

Interesting! I thought that in FreeRTOS, if you "gave" a mutex that had not been taken, nothing would happen. That's the behavior I saw in my example, but it looks like you're seeing resets.

From what I can tell, your resets are happening because of a stack overflow (Task A is filling up the allocated stack to the point the canary watchpoint is tripped). Does the reset happen right away, or does it take some time to occur when you run the program? Did you change anything else other than moving the xSemaphoreGive() calls to inside the if statements?

neilbalch commented 3 years ago

If I remember correctly, and based on the serial output I provided in the original issue text, the resets started occurring immediately after an MCU reset/program cycle. I can't remember how or why, but I thought I originally attributed that reset exception message to the task being in a permanently blocked state as it waited for the mutex to be "given," but your explanation now seems more proper.

As for a list of changes, I believe the only thing I did was move the mutex give instructions. The full code I used is in my repo here.

wanggaoteng commented 2 months ago

I also have the same problem, my board is esp32-s3. Thank you for providing the correct code.