earlephilhower / arduino-pico

Raspberry Pi Pico Arduino core, for all RP2040 and RP2350 boards
GNU Lesser General Public License v2.1
2.03k stars 423 forks source link

Multicore_FreeRTOS.ino example stops working properly after ~1000 s on Pico W #2553

Closed jlbirccyn closed 3 days ago

jlbirccyn commented 5 days ago

Hello,

I started playing with FreeRTOS on a Pico W. I'm using version 4.1.1 on Mac OS X 14.7. I compile with a clock equal to 133MHz. I noticed that after about 1000s (this is not strictly deterministic), the BLINK task stops working properly and the LED stops blinking. The display continues to work. Before the BLINK task stops working properly, I get displays like this:

# Tasks: 9
ID, NAME, STATE, PRIO, CYCLES
0: CORE0            Running    4 492397155
1: IDLE1            Running    0 1494968105
2: IDLE0            Ready      0 1356573463
3: USB              Blocked    6 1009866299
4: BLINK            Blocked    1 293675381
5: CORE1            Blocked    4 42743134
6: IdleCore1        Blocked    7 23556
7: IdleCore0        Blocked    7 14095
8: Tmr Svc          Blocked    2 26125
val: 1063

Where the CORE0 task (loop) is Running (which is normal, since it's the task that does the display) and BLINK is Blocked. The number of BLINK cycles between 2 executions is about 300k.

After the LED has stopped flashing, BLINK appears as Running:

# Tasks: 9
ID, NAME, STATE, PRIO, CYCLES
0: CORE0            Running    4 510351894
1: BLINK            Running    1 1185629487
2: IDLE0            Ready      0 10490835
3: IDLE1            Ready      0 140899477
4: USB              Blocked    6 1211734597
5: CORE1            Blocked    4 44043757
6: IdleCore1        Blocked    7 23556
7: IdleCore0        Blocked    7 14095
8: Tmr Svc          Blocked    2 26125
val: 1106

and the number of cycles between two iterations is now just under 133M. In other words, BLINK takes up almost 100% of the CPU.

Best regards

earlephilhower commented 5 days ago

Is this on the RP2040 or the RP2350? They have significantly different guts...

jlbirccyn commented 5 days ago

It is a Pico W, so a RP2040

maxgerhardt commented 4 days ago

Same thing happens for a regular Pico (RP2040)? Not that it's the WiFi chip that died and hangs up digitalWrite() or something.

earlephilhower commented 4 days ago

If you have a hung chip, could you get the stack trace for the 2nd core? It's got to be frozen on core 1, not core 0, because if core 0 was stuck then the printouts would cease...

jlbirccyn commented 4 days ago

Hello,

Same thing happens for a regular Pico (RP2040)?

I tested on a regular Pico RP2040 and it is ok. It ran all night with no problems.

When it runs on a regular Pico, the cycles counted for BLINK increases by about 9000 each second. When it runs on a Pico W it increases by about 200k - 300k each seconds. So some software activity interrupts BLINK and is counted by FreeRTOS as CPU time of BLINK

Not that it's the WiFi chip that died and hangs up digitalWrite() or something.

I tested on 2 different Pico W and I got the same behavior. I uploaded the example sketch ScanNetworks.ino and it works on both Pico W. So I think the WiFi chip is ok.

earlephilhower commented 3 days ago

If it only happens on the PicoW, that's actually a very good pinpoint. LED control is by the WiFi chip which has a SPI (PIO-emulated) control system. If there's a corner case where some periodic SPI operation (because there's no WiFi in the sketch then it's not app-generated) gets swapped out and the LED gets swapped in during an already-running SPI operation....and then boom.

Thinking about it more, I think the code is not safe on the PicoW because of the digitalWrite. You can't access the WiFi chip from anything other than core0 or bad things will happen. The WiFi chip management (which sends the LED message) is not multicore safe.

earlephilhower commented 3 days ago

What's the longest you got it to run before losing the blink? I added the required core pinning and am at 30 minutes of runtime (>1800sec)

....

void setup() {
  TaskHandle_t blinkTask;
  Serial.begin(115200);
  xTaskCreate(blink, "BLINK", 256, nullptr, 1, &blinkTask);
#ifdef ARDUINO_RASPBERRY_PI_PICO_W
  // The PicoW WiFi chip controls the LED, and only core 0 can make calls to it safely
  vTaskCoreAffinitySet(blinkTask, 1 << 0);
#endif
  delay(5000);
}
earlephilhower commented 3 days ago

I ran the PicoW for 3000 seconds then swapped to the Pico which ran for 3600. I think we're good with the patch, this was just the blink running on the wrong(illegal) core.

jlbirccyn commented 3 days ago

What's the longest you got it to run before losing the blink? I added the required core pinning and am at 30 minutes of runtime (>1800sec)

....

void setup() {
  TaskHandle_t blinkTask;
  Serial.begin(115200);
  xTaskCreate(blink, "BLINK", 256, nullptr, 1, &blinkTask);
#ifdef ARDUINO_RASPBERRY_PI_PICO_W
  // The PicoW WiFi chip controls the LED, and only core 0 can make calls to it safely
  vTaskCoreAffinitySet(blinkTask, 1 << 0);
#endif
  delay(5000);
}

Sorry but I'm on the other side of the Atlantic Ocean and discussions are a bit disjointed :)

I've never had a correct execution longer than 1100s.

jlbirccyn commented 3 days ago

Thanks Earle and Max. In the meantime, I had soldered a connector to the SWCLK, SWDIO and GND pins to plug in a PicoProbe. This will be useless for this problem, it seems.