bdring / Grbl_Esp32

A port of Grbl CNC Firmware for ESP32
GNU General Public License v3.0
1.69k stars 529 forks source link

assertion "pbuf_free: p->ref > 0" failed when using Telnet #1364

Open rominetb44 opened 1 year ago

rominetb44 commented 1 year ago

Hi,

When using telnet to send GCODE for more than 20mn, GRBL crash with this assertion :

assertion "pbuf_free: p->ref > 0" failed: file "/home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/pbuf.c", line 765, function: pbuf_free
abort() was called at PC 0x4015b297 on core 1

Backtrace: 0x40093e54:0x3ffbc730 0x40094085:0x3ffbc750 0x4015b297:0x3ffbc770 0x4017dda7:0x3ffbc7a0 0x401762d9:0x3ffbc7c0 0x40176de5:0x3ffbc840 0x40176e62:0x3ffbc870 0x400ee03e:0x3ffbc890 0x400ee0a9:0x3ffbc8b0 0x400ee16d:0x3ffbc8d0 0x400e1293:0x3ffbc8f0 0x400eb6d0:0x3ffbcd20 0x400eb38c:0x3ffbcd40 0x400da458:0x3ffbcd60 0x400903d9:0x3ffbcd80

Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1216
ho 0 tail 12 room 4
load:0x40078000,len:9720
ho 0 tail 12 room 4
load:0x40080400,len:6352
entry 0x400806b8

[MSG:Grbl_ESP32 Ver 1.3a Date 20210311]
[MSG:Compiled with ESP32 SDK:v3.2.3-14-gd3e562907]
[MSG:Using machine:MakerFr GRBL 32 bits Board V2 XYZA]
[MSG:serialCheckTask Min Stack Space: 2723]
[MSG:serialCheckTask Min Stack Space: 2500]
[MSG:Axis count 4]
[MSG:RMT Steps]
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400d5a2f  PS      : 0x00060031  A0      : 0x80082138  A1      : 0x3ffbe930
A2      : 0x3ffbbd18  A3      : 0x3ffc5c4c  A4      : 0x00000000  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000004  A8      : 0x3ffd7154  A9      : 0x00000004
A10     : 0x00000000  A11     : 0x3ffd7154  A12     : 0x8009328a  A13     : 0x3ffd32f0
A14     : 0x00000001  A15     : 0x3ffd7154  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffb
Core 1 was running in ISR context:
EPC1    : 0x400d5a2f  EPC2    : 0x00000000  EPC3    : 0x00000000  EPC4    : 0x40088738

Backtrace: 0x400d5a2f:0x3ffbe930 0x40082135:0x3ffbe950 0x40085f01:0x3ffbe980 0x4000bfed:0x3ffd3360 0x40091561:0x3ffd3370 0x400ff74a:0x3ffd3390 0x4018e97a:0x3ffd33d0 0x4008205d:0x3ffd3400 0x400ded1b:0x3ffd3440 0x400d46ca:0x3ffd3460 0x400d2d7b:0x3ffd3480 0x400fb3ef:0x3ffd34a0 0x400903d9:0x3ffd34c0

Rebooting...
ets Jul 29 2019 12:21:46

This is the decoding code result :

Decoding stack results
0x40093c80: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 155
0x40093eb1: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 170
0x4015b203: __assert_func at ../../../.././newlib/libc/stdlib/assert.c line 63
0x4017dd13: pbuf_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/pbuf.c line 765
0x40176245: lwip_recvfrom at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/sockets.c line 1176
0x40176d51: lwip_recvfrom_r at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/sockets.c line 3399
0x40176dce: lwip_recv_r at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/sockets.c line 3406
0x400ee076: WiFiClientRxBuffer::fillBuffer() at C:\Users\romai\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4/tools/sdk/include/lwip/lwip/sockets.h line 583
0x400ee0e1: WiFiClientRxBuffer::read(unsigned char*, unsigned int) at C:\Users\romai\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\libraries\WiFi\src\WiFiClient.cpp line 107
0x400ee1a5: WiFiClient::read(unsigned char*, unsigned int) at C:\Users\romai\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.4\libraries\WiFi\src\WiFiClient.cpp line 434
0x400e12c7: WebUI::Telnet_Server::handle() at C:\Users\romai\AppData\Local\Temp\arduino_build_482630\sketch\src\WebUI\TelnetServer.cpp line 152
0x400eb707: WebUI::WiFiServices::handle() at C:\Users\romai\AppData\Local\Temp\arduino_build_482630\sketch\src\WebUI\WifiServices.cpp line 167
0x400eb3c0: WebUI::WiFiConfig::handle() at C:\Users\romai\AppData\Local\Temp\arduino_build_482630\sketch\src\WebUI\WifiConfig.cpp line 443
0x400da49c: serialCheckTask(void*) at C:\Users\romai\AppData\Local\Temp\arduino_build_482630\sketch\src\Serial.cpp line 208
0x400903c9: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

I spend lot of time to dig this issue without resolved it.

It seems to be similar to this issue : https://github.com/espressif/arduino-esp32/issues/4418

Is the TelnetServer class thread safe ?

The most I increase the value of "COMMANDS::wait(0);" in the Telnet_Server::handle() function, longer is the time before crash. Almost 1 hour with COMMANDS::wait(5);

Could you help ?

Thanks

MitchBradley commented 1 year ago

We are not developing or supporting Grbl_Esp32 . All of our efforts have switched to FluidNC. FluidNC's telnet works much better than Grbl_Esp32's.

rominetb44 commented 1 year ago

Hello, Thank you for your answer.

So, I'll tried FluidNC.

Regards.

rominetb44 commented 1 year ago

Hi,

I finally managed to fix the issue.

I thunk there is many problems.

The first cames from the low_level library used by WiFiClient which is not thread safe.

The second was the use of a Mutex in Serial.cpp to protect data. But it only prevents from multi-core access. When the threads are in the same core, there is no protection. So I added a semaphore in Serial.cpp and it seems to fix all problem (no more bad or missing characters and no more crash).

Oddly, there is no more crash just fixing the second problem. Maybe an heap overflow, but everytime it was the same exception in lwip_recvfrom (pbuf_free).

I'll share my modification.

Thanks

rominetb44 commented 1 year ago

Hi,

I finally also fixed the first issue and get no more crash (test 20 job of one hour with telnet connection).

Thanks