adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.07k stars 1.2k forks source link

Safemode due to heap allocation failure (since PR#9325) #9362

Closed bablokb closed 2 weeks ago

bablokb commented 3 months ago

CircuitPython version

since PR#9325

Code/REPL

def send_file_to_host(src_filename, dst_file, filesize, buf_size):
  import sys
  import binascii
  try:
    with open(src_filename, 'rb') as src_file:
      bytes_remaining = filesize
      buf_size = buf_size // 2
      while bytes_remaining > 0:
        read_size = min(bytes_remaining, buf_size)
        buf = src_file.read(read_size)
        sys.stdout.write(binascii.hexlify(buf))
        bytes_remaining -= read_size
        while True:
          char = sys.stdin.read(1)
          if char:
            if char == '\x06':
              break
            sys.stdout.write(char)
    return True
  except:
    return False
try:
  output = send_file_to_host('boot_out.txt', None, 167, 32)
except Exception as ex:
  print(ex)
  output = None
if output is None:
  print("None")
else:
  print(output)

Behavior

Safemode error-message:

ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:396
ho 0 tail 12 room 4
load:0x40078000,len:13904
load:0x40080400,len:4
load:0x40080404,len:3156
entry 0x40080558
Serial console setup

Automatisches Neuladen ist deaktiviert.
Sicherheitsmodus aktiv! Gespeicherter Code wird nicht ausgeführt

Sie befinden sich im abgesicherten Modus, weil:
Der CircuitPython-Kerncode ist hart abgestürzt. Hoppla!
Unable to allocate to the heap.
Reiche bitte ein Problem mit deinem Programm bei github.com/adafruit/circuitpython/issues ein.
Drücke Reset, um den Sicherheitsmodus zu beenden.

Drücke eine beliebige Taste um REPL zu betreten. Drücke STRG-D zum neuladen.

Description

This is code running from the REPL. The code will work multiple times, but after a while it will crash. This is hard to reproduce, sometime two iterations are necessary, sometimes more.

Additional information

I have this problem with various ESP32 boards since #9325. When I build up to and including 7c85f6a15af330770d7d7ccf2700d5eb9f83c222, the problem does not exist (#9325 is the next commit).

I could not reproduce the problem with a QTPY-ESP32-C3.

I suspect this is an issue with the switch to IDF-5.2.2. I know that activating BLE consumes a lot of memory, but my code does not do a lot, it basically prints boot_out.txt to the REPL.

dhalbert commented 3 months ago

I tried this test program, modifying it a bit to send a shorter amount of data (because boot_out.txt wasn't that big), and could not reproduce the problem in recent builds. That's a hopeful sign.

bablokb commented 3 months ago

It seems to be more complicated. Running code inside the REPL is fine, but sending code to the REPL for execution causes a memory leak:

> cpshell -L en_US repl 'import gc\; gc.mem_free() \;'
Entering REPL. Use Control-X to exit.
>

Adafruit CircuitPython 9.1.0-beta.3-53-g1188d67263 on 2024-07-02; sunton_esp32_2432S032C with ESP32
>>> 
>>> import gc; gc.mem_free()
118176
> cpshell -L en_US repl 'import gc\; gc.mem_free() \;'
Entering REPL. Use Control-X to exit.
>

Adafruit CircuitPython 9.1.0-beta.3-53-g1188d67263 on 2024-07-02; sunton_esp32_2432S032C with ESP32
>>> 
>>> import gc; gc.mem_free()
105888
> cpshell -L en_US repl 'import gc\; gc.mem_free() \;'
Entering REPL. Use Control-X to exit.
>

Adafruit CircuitPython 9.1.0-beta.3-53-g1188d67263 on 2024-07-02; sunton_esp32_2432S032C with ESP32
>>> 
>>> import gc; gc.mem_free()
93600

Repeating this a few more times will eventually trigger the heap allocation failure and safemode.

Some notes:

tannewt commented 3 months ago

Think the web workflow is leaking memory?

bablokb commented 3 months ago

I am not using the web workflow. There is no settings.toml at all.

dhalbert commented 2 months ago

I'm trying your cpshell example above with a build after after #9409 was merged, and could not get it to act up. Could you try an Absolute Newest build that is PR9409 or later?

dhalbert commented 2 months ago

I just released 9.1.0-rc.0 which includes #9409.

bablokb commented 2 months ago

No change. Even the mem-free figures are very similar (the first two even identical).

Is there anything I can do to help debugging? I could probably create and run a debug build.

dhalbert commented 2 months ago

@bablokb Thanks. We could really use a minimal example that reproduces the problem on an Adafruit or Espressif board: I don't have the Sunton board you are testing on, for instance. As I mentioned I could not reproduce the problem in the first post on a QT Py ESP32-C3.

bablokb commented 2 months ago

It is only a problem on ESP32 boards, there is no problem with ESP32-C3. I don't own an Adafruit ESP32 board, but I did try this on other ESP32-boards (with no official support for CP yet) and have the same issue.

RetiredWizard commented 2 months ago

When I paste code to a Ctrl-E repl of the ItsyBitsy ESP32 it consistently starts dropping characters after 255 characters. After skipping about 275 characters, another 124 characters are correctly received and then 937 characters are dropped followed by another 152 properly transmitted characters, etc.....

This feels to me like the flow control between the CH9102 USB-to-Serial Converter chip and the ESP32 chip isn't working. From some quick searching there are two places I found flow control being configured, /ports/espressif/common-hal/busio/UART.c and /lib/tinyusb/hw/bsp/espressif/boards/family.c. I haven't had a chance to dig into either location but I plan on looking at that code next.

RetiredWizard commented 2 months ago

I suspect Dan has explained this to me at least once already, but all the board/chip/uart/etc combinations are too much for me to keep a handle on.... Looking at the ItsyBitsy ESP32 circuit diagram, there doesn't appear to be any hardware flow control connected between the CH9102 and ESP32 which I guess means that the ESP32 is expected to be able to keep up with the data stream.

By slowing down the UART baud rate I was able to increase the number of characters that were transmitted before characters started being dropped, but I slowed it all the way down to 300 baud and still saw the dropped data behavior.

I've been staring at the "paste mode" input loop in pyexec.c which must be where the data is getting lost but I can't see any issues. I'm now wondering if background tasks could be running that are interrupting the input loop but it's going to take me a bit of digging to see if that line of thought has any potential.

RetiredWizard commented 2 months ago

Well, by disabling the character echo in the paste mode input loop, I was able to successfully paste a 12.5k python script :grin:. I have to take a break for a bit, but the next step is to trace through where ever the mp_hal_stdout_tx_str goes and see what could be hanging up the works.

RetiredWizard commented 2 months ago

I believe the paste issue I'm having is caused by heavy UART traffic and is probably unrelated to the memory issues you are now reporting. I'll open a new issue for the paste problem and reference this just in case it turns out there is some overlap.

tannewt commented 3 weeks ago

The BLE workflow is leaking data every reset. #9599 also crashes due to this.

dhalbert commented 2 weeks ago

Should be fixed by #9616. Please reopen if you find otherwise.