esp-rs / esp-storage

implementation of embedded-storage traits to access unencrypted ESP32 flash
Apache License 2.0
25 stars 4 forks source link

Hang on write when using esp-wifi in the same crate #21

Open karlri opened 1 year ago

karlri commented 1 year ago

I attempted to make an OTA flasher for esp32 using esp-wifi and esp-storage. Using only esp-storage, I'm able to read the partition table and I'm able to write the "otadata" partition and make it boot another partition.

However, as soon as I add esp-wifi as dependency, the embedded storage write seems to hang. This happens even if I don't call any function from esp-wifi, i just add the dependency and the necessary .cargo/config changes for it to compile.

I already tried bumping hal versions in esp-storage and adapting main for the &mut peripheral_clocks_control change. This crate works with new hal, but the issue persists. As soon as esp-wifi is added as dependency, write hangs. Very strange.

Any ideas what might be going on or what I might try?

bjoernQ commented 1 year ago

If only adding the dependency is causing this, then it's very odd. If you actually initialize esp-wifi in your code it makes sense.

There is https://github.com/bjoernQ/esp32c3-ota-experiment which is for ESP32-C3 but I just pushed the esp32 branch.

On ESP32 this library needs to run our own version of the flash functions from RAM with flash detached. esp-wifi contains a scheduler which when it triggers runs code from flash which is not good when the flash is detached.

In my example it seems to work because it's using an unreleased version of esp-storage which by defaults uses the critical-section feature (to avoid any interrupts while flash is detached). My code is using a quite old version of esp-hal and esp-wifi - if there were any changes making it not work anymore, we need to find the problematic commit

I just tested it again and it seems I can't get esptool.py to generate a valid image but the flashing seems to work fine

karlri commented 1 year ago

After some trial and error, i have got it working and I have concluded the following:

lto=false
and
lto="off"

are not the same. The first is actually "thin local lto" which is the default for the release profile. The latter turns off lto entirely and causes write to hang on esp32. I used the latter because of a recommendation from esp-wifi:

[profile.release]
# Explicitly disable LTO which the Xtensa codegen backend has issues
lto = "off"

Now I guess I'll investigate what happens to esp-wifi when thin local lto is enabled. Maybe it has improved since issues about it were opened.

karlri commented 1 year ago

Here are some findings with different lto settings and what works and what does not on esp32:

opt-level = 3 # important. esp-storage claims opt-level must be 2 or 3.
lto = "fat" # everything seems to work: storage write and esp-wifi

opt-level = 3 # important. esp-storage claims opt-level must be 2 or 3.
lto = "thin" # everything seems to work: storage write and esp-wifi

opt-level = 3 # important. esp-storage claims opt-level must be 2 or 3.
# NOTE THAT false MEANS THAT "thin-local" LTO IS ACTUALLY ENABLED! 
lto = false # everything seems to work: storage write and esp-wifi

opt-level = 3 # important. esp-storage claims opt-level must be 2 or 3.
lto = "off" # BROKEN. esp-storage write hangs.

rustc +esp --version
rustc 1.70.0-nightly (f112def22 2023-05-31) (1.70.0.1)
bjoernQ commented 1 year ago

Thanks for the analysis of the effects of different lto settings (cc @MabezDev)

umgefahren commented 8 months ago

Has this been resolved then? because it still happens to me even with the suggested changes. So what am I doing wrong?

MabezDev commented 8 months ago

I missed this ping, sorry folks! This should be fixed for all single core chips, hence why @bjoernQ's ESP32-C3 based project above works.

For multi core, the whole world gets a bit more complicated. Neither core can access flash whilst writing to it, if you do all kinds of mayhem will ensue. To ensure the other core doesn't read the flash there needs to be some kind of software solution to say to the other core, "hey, stop until I say go". You also might not want to stop the other core at all, the other core can still run provided it doesn't refill the cache when trying to execute, so any code already in the cache, in RAM or ROM can be executed freely.

So, in short, writing to flash in multicore systems is only possible when:

1) The second core isn't running 2) The second core is only running code already in the cache, RAM, or ROM

The only example I know of writing to flash on the esp32s3 is @bugadani's card-io-fw, and nor-fs. We don't have a generic solution for that yet. Might be some more interesting info in https://github.com/esp-rs/esp-storage/issues/26 too.

bjoernQ commented 8 months ago

Has this been resolved then? because it still happens to me even with the suggested changes. So, what am I doing wrong?

Does your application do anything on the second core? As @MabezDev said for multi core just disabling interrupts is not enough. There was a PR addressing this by simply halting the other core when accessing the flash: https://github.com/esp-rs/esp-storage/pull/29

In my tests back then it worked but I never heard back from the user who created the original issue so it never got merged

If you are only using one core it should work.

Does the example in this repository work for you?

umgefahren commented 8 months ago

I think the issue in my case was caused by me writing in some area where esp-wifi wanted to write.