adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
MIT License
3.98k stars 1.17k forks source link

Pico-W memory bloat #9085

Open bablokb opened 4 months ago

bablokb commented 4 months ago

CircuitPython version

Adafruit CircuitPython 9.0.0 on 2024-03-19; Raspberry Pi Pico W with rp2040

Code/REPL

import gc
print(f"free memory at start: {gc.mem_free()}")

main.py Ausgabe:
free memory at start: 126112

Behavior

With CP 9.0.0, we have about 123K of free memory. With 8.0.5, we had 152K (139K with 8.1.0/8.2.10). So we lost 20% of available memory from one major release to the next.

This really hurts. Using e.g. a HTTPS-API and updating a display is already deadly for a number of my programs.

I do know that CP has gained a number of new features, but I would prefer to also have a slim build with less features but more available memory. Any suggestions regarding optimized settings are highly welcome.

Description

No response

Additional information

No response

bill88t commented 4 months ago

The primary ram hog is the wifi stack really. I don't think much (ram) can be gained from custom builds. The difference between CP versions is probably just from the underlying lib version changes.

Some tips:

If these are not enough for your applications, you may wanna look into ESP32(-S3) boards, that ideally have PSRAM. 512kb is nice, but having 2 or more megabytes of ram is a lot better. And the bulk cost isn't that much different really.

dhalbert commented 4 months ago

Added firmware features do not necessarily use up RAM immediately. The difference may be due to the new "split heap" storage allocation. We would need to track down if the RAM is actually used up, or whether gc.mem_free()'s report is as helpful as it appears.

bablokb commented 4 months ago

I have done already all of this (e.g. use pre-compiled modules) and have carefully garbage collected my objects. All my tests are with the exactly same code and library versions. I have instrumented every part of my program and have detailed data for various modules I use.

The data clearly shows that wifi and other module usage does not differ greatly between CP versions. The big initial difference of free-mem at the beginning of the program carries through the whole runtime of the program.

In detail I configure the busses, add sensors, take readings, write to SD-card, send via LoRa and finally update a display. Minus the intial ram-value the mem-free readings at any stage of the program are almost the same across the versions.

And I don't think this is only due to split heap, because the large drop from 8.0.5 to 8.1.0 (152K->139K) is already before this was done. But the second big drop from 8.2.10->9.0.0 (139K->123K) might be related. In any case, the output of gc.mem_free() might not be exact, but it correlates very well with the MemoryErorrs I see.

There were two major changes according to the release notes from 8.0.5->8.1.0: one was the addition of PICODVI. But turning this off only saves 4K. The second major change was the update of the SDK. And maybe something I missed.

ESP32-S2 or S3 are fine, but the price difference is huge for large scale deployments.

tannewt commented 4 months ago

The starting heap memory is largely dictated by the static memory allocations. Features like picodvi can use this space up because it includes all code that needs to run when flash is unavailable (like the dvi conversion code.)

To dig into this, you'll want to do your own build and look at the .map file. It'll lay out how much of RAM is used statically and what symbols take up that space.

bill88t commented 4 months ago

One more tiny tip I have is that you may wanna consider setting CIRCUITPY_PYSTACK_SIZE to something like 1792 or even lower. For every 256 you remove from this value it should give you 256 bytes ram, but lose a few stack frames. If you have a lot of nested calls (or your libs do), you will see pystack errors with this change though. You can fine tune it and have exactly the size you need though.

dmcomm commented 4 months ago

If I understand right, you can save some RAM by doing a custom build with custom frozen modules: https://learn.adafruit.com/building-circuitpython/adding-frozen-modules

Things like "turning off PICODVI saves 4K" are useful to know.

The extra RAM from compiling .py files gets freed afterwards, so precompiling to .mpy doesn't really help with RAM, provided you have enough to complete the compilation.

dhalbert commented 4 months ago

I did some looking at this, first by just using arm-none-eabi-nm, and then by using https://github.com/google/bloaty (which I had never used before)

Here is some info, first comparing 8.1.0 to 8.0.5. Notice the growth in the in the .data and .itcm sections, which are RAM. .itcm is code in RAM (right?). .rodata can be flash.

I need to build without picodvi turned on, which I didn't do consistently yet.

$ ~/repos/bloaty/build/bloaty ~/build-raspberry_pi_pico_w_8.1.0/firmware.elf -- ~/build-raspberry_pi_pico_w_8.0.5/firmware.elf
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +127%  +231Ki  +127%  +231Ki    .rodata
  +3.2% +26.4Ki  +3.2% +26.4Ki    .text
   +19% +24.3Ki  [ = ]       0    [Unmapped]
  +6.5% +22.7Ki  [ = ]       0    .symtab
  +7.4% +14.8Ki  [ = ]       0    .strtab
  +222% +6.68Ki  +222% +6.67Ki    .data
   +48% +3.67Ki   +48% +3.67Ki    .itcm
  [NEW] +2.02Ki  [ = ]       0    .stack1_dummy
  [NEW] +1.17Ki  [NEW] +1.17Ki    .scratch_x
  [ = ]       0  +1.3%    +872    .bss
   +67%     +64  [ = ]       0    [ELF Program Headers]
  [ = ]       0  [NEW]     +52    [LOAD #4 [RW]]
  [NEW]     +48  [ = ]       0    .uninitialized_data
  +9.1%     +40  [ = ]       0    .debug_frame
  +2.1%      +4  [ = ]       0    .ram_vector_table
  [DEL]     -12  [DEL]     -12    .uninitialized_ram.magic_location
 -12.2%     -32  [ = ]       0    .shstrtab
  -4.2%     -40  [ = ]       0    [ELF Section Headers]
 -63.9% -1.50Ki -64.0% -1.50Ki    .dtcm_bss
  [DEL]  -219Ki  [DEL]  -219Ki    .big_const

And now 9.0.0 vs 8.1.0. Again, .data grew. I haven't yet looked at the individual symbols, which bloaty can also do with the -d flag, like -d symbols,sections. I am just starting to learn how to use it. I want to ignore certain sections: I don't know yet if I can do that.

$ ~/repos/bloaty/build/bloaty ~/build-raspberry_pi_pico_w_9.0.0/firmware.elf -- ~/build-raspberry_pi_pico_w_8.1.0/firmware.elf
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +7.9% +66.5Ki  +7.9% +66.5Ki    .text
  +7.5% +28.0Ki  [ = ]       0    .symtab
  +7.9% +17.1Ki  [ = ]       0    .strtab
  +2.6% +10.8Ki  +2.6% +10.8Ki    .rodata
   +88% +8.53Ki   +88% +8.54Ki    .data
  [ = ]       0  +9.8% +6.47Ki    .bss
  +123%    +592  [ = ]       0    .debug_frame
  [NEW]    +264  [NEW]    +260    .uninitialized
  +4.3%     +40  [ = ]       0    [ELF Section Headers]
  +7.8%     +18  [ = ]       0    .shstrtab
  +3.9%      +3  [ = ]       0    .comment
  -2.0%      -4  [ = ]       0    .ram_vector_table
 -21.6%     -11  [ = ]       0    .ARM.attributes
  -0.8%     -16  [ = ]       0    .stack1_dummy
  -4.9%    -568  -4.9%    -568    .itcm
  [DEL]    -868   +25%    +216    .dtcm_bss
 -25.4% -38.5Ki  [ = ]       0    [Unmapped]
  +4.5% +91.8Ki  +6.8% +92.2Ki    TOTAL
bablokb commented 4 months ago

Thank you all for looking into this.

@tannewt : I could create the map-files for the different versions and then compare them. What exactly am I looking for? All .data.xxx sections?

bablokb commented 4 months ago

Disabling picodvi and usb-host brings me back to 143k. The build fails with CIRCUITPY_USB_HOST = 0, I am preparing a patch for that.

bablokb commented 4 months ago

Created PR: #9091

tannewt commented 4 months ago

Thank you all for looking into this.

@tannewt : I could create the map-files for the different versions and then compare them. What exactly am I looking for? All .data.xxx sections?

.bss and .itcm sections will also take space in RAM. The address can tell you want memory region everything is in.

bablokb commented 4 months ago

Tested this with a Trinket-M0 (SAMD21):

This is in so far interesting, since we have 9.0.1/8.2.10 = 95%. And when I disable PICODVI and USB_HOST I have 94% for the Pico-W. So my assumption is that these 5% loss are from merging MicroPython. Drilling down into the symbols might give some more insights.

I also worked through the .data, .bss and .itcm sections, but there is not much room for improvement left. Disabling mdns, webworkflow and BLE gives some 100+ bytes, but that is not worth it.

What I would do now: Pimoroni has a number of boards (mainly with e-ink displays) that have an integrated Pico-W. Unless you have advanced soldering skills, the pins are not accessible. So disabling PICODVI and USB_HOST on these boards would probably not hurt anybody but would reclaim 20K of RAM. Maybe there are other boards as well that are candidates for this change, but I don't know.

@tannewt , @dhalbert : your opinion? I would create a pull-request for the change.

dhalbert commented 4 months ago

What I would do now: Pimoroni has a number of boards (mainly with e-ink displays) that have an integrated Pico-W. Unless you have advanced soldering skills, the pins are not accessible. So disabling PICODVI and USB_HOST on these boards would probably not hurt anybody but would reclaim 20K of RAM. Maybe there are other boards as well that are candidates for this change, but I don't know.

That's a fine idea, if there is no way to have those use PCIODVI and USB host in any case. Thanks!

bablokb commented 3 months ago

Just an update: I did some more tests to find out why free memory decreased. One reason is a patch for tinyusb. That patch fixed some problems with the Pico and the Pi4/Pi400 but had the side-effect that it moved a lot of code from flash to ram (about 4K, mainly interrupt handlers, so there is good reason for this change).

With some hacks I managed to push this code back to flash (where it was in 8.0.5). The sections .data, .bss and .itcm did shrink the expected amount, but the net effect on gc.mem_free() was zero. Maybe someone can explain why?!

One option that might be useful for other users could be a "Raspberry Pi Pico W Slim" build without PICODVI and USB_HOST. In fact, I am running all my projects that use https with such a slim build now.

dhalbert commented 3 months ago

@tannewt Do you think we should just turn off picodvi and usb_host on the Pico W? Or make a separate build?

tannewt commented 3 months ago

@tannewt Do you think we should just turn off picodvi and usb_host on the Pico W? Or make a separate build?

I think we should leave the Pico W build as-is. Turning off picodvi and usb_host prevents anyone from using them. The build as-is has soft memory limits to what it can do but that will always be true. In the long term we can make more of this memory use dynamic (probably not the ram functions) so that one can turn off features (like USB MIDI) to save ram.

dmcomm commented 3 months ago

The build as-is has soft memory limits to what it can do but that will always be true.

It's just unfortunate that adding features makes existing projects stop working, and Pico W is more affected than other RP2040 boards because so much memory is reserved for WiFi. I wonder how much Bluetooth will need. If you do later add the ability to reclaim some dynamically, that will surely help.

bablokb commented 3 months ago

Although @tannewt is correct that removing picodvi and usb_host would prevent (most) users from using it, this does not imply that we cannot have a second build that has more available memory.

My personal guess is that more users are hit by the memory limits than would be hit by removing picodvi/usb_host, but having two builds with a choice for users would certainly be a valid option.

dhalbert commented 3 months ago

@bablokb re https://github.com/adafruit/circuitpython/issues/9085#issuecomment-2078702631, so when you did turn them off, are you able to do all the things you did in 8.x.x?

jerryneedell commented 3 months ago

FYI - When I removed the two modules AND switched to using .mpy version of the libraries, I am able to connect to and interact with AIO as I had desired. However, when I try to add in any use of the ov5640 camera, I quickly run into memory issues again. I may try removing even more modules to see if I can get anywhere but I don't think the picow is a viable candidate for the projects I have in mind.

anecdata commented 3 months ago

@jerryneedell could you add the camera in CP 8?

If the board order on the circuitpython.org downloads page is any indication, Pico W is a very popular board. I tend to agree that memory is a high priority for network users.

jerryneedell commented 3 months ago

@jerryneedell could you add the camera in CP 8?

If the board order on the circuitpython.org downloads page is any indication, Pico W is a very popular board. I tend to agree that memory is a high priority for network users.

I have not tried it with CP8 -- The new camera boards are specifically for the pic0/picow and I've just started working with them.

bablokb commented 3 months ago

@bablokb re #9085 (comment), so when you did turn them off, are you able to do all the things you did in 8.x.x?

No. It helps, but using wifi and displayio still uses too much memory even after simplifying what I write to the display.

That is why I tried to move the tinyusb-code back to flash. But that did not change the memory available to python-programs (see above).

I also deactivated all other stuff I don't need (e.g. web-workflow), but that is not worth the effort, it saves a few bytes, but it is no game-changer. What I could try is to move a number of libraries to the firmware, but sticking with 8.0.5 is simpler.

bablokb commented 3 months ago

I don't think the picow is a viable candidate for the projects I have in mind.

For a number of projects I have switched to the Waveshare ESP32-S3 Pico. It has the form-factor and pinout of the Pico. I have my own "compatibility" build for that board using Pico-W pin-names and all my programs for the Pico work without change.

But although this board is cheap compared to other S3 boards, it is much more expensive than the Pico-W, so not suitable for this other project which is deployed in much higher volume.

jerryneedell commented 3 months ago

I don't think the picow is a viable candidate for the projects I have in mind.

For a number of projects I have switched to the Waveshare ESP32-S3 Pico. It has the form-factor and pinout of the Pico. I have my own "compatibility" build for that board using Pico-W pin-names and all my programs for the Pico work without change.

But although this board is cheap compared to other S3 boards, it is much more expensive than the Pico-W, so not suitable for this other project which is deployed in much higher volume.

Thanks for the board information. That may be worth looking into to make the pico form factor cameras useful. I am just making things for myself so cost/board is not really a factor.