Closed manio closed 1 year ago
I tested with the blocking API and have the ESP32 connected for more than 17h now without problems. So, it's not really normal. I also had (some time ago) a temperature logging application running for several days without issues.
I wouldn't really expect using async to be the problem but maybe worth a try.
In general, the examples are kept simple - they don't do re-connects or re-tried connects.
Can you get some more information from your access-point's logs? (e.g. a disconnect reason)
What power source do you use for the ESP32? A bad power source can definitely cause WiFi issues
@bjoernQ Thanks for your tips.
Most of the development time I am using a direct USB connection and the ESP-WROOM-32U is powered from the PC; but I was suspecting the power supply and I just connected the device once to the separated lab power supply with the same result. Nevertheless I think that I could give it another try...
When it stops working it also stops printing my debug lines from every async task (even not wifi-related) with println!
on my espflash monitor
, so my AP logs are probably not very useful, as I can clearly see on the console, that the device has stopped working.
First I think I'll try the blocking code and see if it helps. I also find some problem with the wifi init described in issue: https://github.com/esp-rs/esp-wifi/issues/214, so maybe this is some tip for me...
I just started the regular (blocking) dhcp example... will see how long it will run.
btw: I was also trying to update to the latest esp-hal github version to see if this helps, so I updated my dependencies
in Cargo to git = "https://github.com/esp-rs/esp-hal"
instead of 0.13.0
, but this is giving me the following error:
error: One of the feature flags must be provided: embassy-time-systick, embassy-time-timg0,
--> /home/manio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-common-0.10.0/build.rs:27:9
|
27 | compile_error!(concat!("One of the feature flags must be provided: ", $($all, ", "),*));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
162 | assert_unique_used_features!("embassy-time-systick", "embassy-time-timg0");
| -------------------------------------------------------------------------- in this macro invocation
|
= note: this error originates in the macro `assert_used_features` which comes from the expansion of the macro `assert_unique_used_features` (in Nightly builds, run with -Z macro-backtrace for more info)
Do you have any idea how to fix this? I still have the features set for the esp32-hal
crate:
features = ["embassy", "embassy-time-timg0", "async"]
@jessebraham, @MabezDev: do you guys maybe know how to fix the above compilation error?
My complete dependencies set is currently:
[dependencies]
embassy-executor = { version = "0.2.0", features = ["nightly", "integrated-timers", "arch-xtensa", "executor-thread"] }
static_cell = "1.1.0"
esp-wifi = { git = "https://github.com/esp-rs/esp-wifi", features = ["esp32", "embassy-net", "embedded-svc", "wifi"] }
embassy-time = { version = "0.1.2", features = ["nightly", "tick-hz-1_000_000"] }
embassy-net = { version = "0.1.0", features = [ "nightly", "tcp", "udp", "dhcpv4", "medium-ethernet" ] }
hal = { package = "esp32-hal", git = "https://github.com/esp-rs/esp-hal", features = ["embassy", "embassy-time-timg0", "async"] }
esp-backtrace = { version = "0.7.0", features = ["esp32", "panic-handler", "exception-handler", "print-uart"] }
esp-println = { version = "0.5.0", features = ["esp32", "log"] }
embedded-svc = { version = "0.25.3", default-features = false }
embedded-hal-async = { version = "=0.2.0-alpha.2"}
log = "0.4.16"
embedded-io = "0.4.0"
format_no_std = "1.0.0"
I just started the regular (blocking) dhcp example... will see how long it will run.
btw: I was also trying to update to the latest esp-hal github version to see if this helps, so I updated my dependencies in Cargo to
git = "https://github.com/esp-rs/esp-hal"
instead of0.13.0
, but this is giving me the following error:error: One of the feature flags must be provided: embassy-time-systick, embassy-time-timg0, --> /home/manio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-common-0.10.0/build.rs:27:9 | 27 | compile_error!(concat!("One of the feature flags must be provided: ", $($all, ", "),*)); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... 162 | assert_unique_used_features!("embassy-time-systick", "embassy-time-timg0"); | -------------------------------------------------------------------------- in this macro invocation | = note: this error originates in the macro `assert_used_features` which comes from the expansion of the macro `assert_unique_used_features` (in Nightly builds, run with -Z macro-backtrace for more info)
Do you have any idea how to fix this? I still have the features set for the
esp32-hal
crate:features = ["embassy", "embassy-time-timg0", "async"]
It's better to patch the dependencies like
[patch.crates-io]
esp-hal-common = { git = "https://github.com/esp-rs/esp-hal" }
esp32-hal = { git = "https://github.com/esp-rs/esp-hal" }
esp32s3 = { git = "https://github.com/esp-rs/esp-pacs", rev = "ce6d5b5" }
@bjoernQ Thanks, with this dependencies patch it is compiling fine :) I didn't test the binary though, as I am still testing the dhcp blocking example (so far so good... :)
Ok - I've tested it during the whole night (blocking dhcp example):
--- esp ping statistics ---
53863 packets transmitted, 53856 received, 0.0129959% packet loss, time 53895027ms
rtt min/avg/max/mdev = 3.665/13.394/82.319/3.590 ms
almost 15h without any problems and it was still working :)
Then I switched back to embassy_dhcp
and it hangs after 3h 43m.
Do you maybe have some "spare" ESP32 to test the embassy_dhcp
on?
Ok just repurposed that ESP32 for the embassy_dhcp
test.
The blocking code was still working after more than 45h
Let's see how that will work for me
Ok ... seems it already died for me. So, I'm pretty sure there is some instability specific to the async side of things
Oh - so this is good, that it is not specific to me only :)
I've just updated my main program using your [patch.crates-io]
,
specifically esp-hal
from 0.13.0
to git version,
embassy-time
from 0.1.1
to 0.1.2
and I also did cargo update
to have fresh git version of esp-wifi
and esp-hal*
plus one API change:
diff --git a/src/main.rs b/src/main.rs
index 842bfb7..955d17c 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -155,7 +155,7 @@ fn main() -> ! {
)
.unwrap();
let (wifi, _) = peripherals.RADIO.split();
- let (wifi_interface, controller) = esp_wifi::wifi::new_with_mode(&init, wifi, WifiMode::Sta);
+ let (wifi_interface, controller) = esp_wifi::wifi::new_with_mode(&init, wifi, WifiMode::Sta).unwrap();
let timer_group0 = TimerGroup::new(peripherals.TIMG0, &clocks, &mut system.peripheral_clock_control);
let mut wdt = timer_group0.wdt;
... and I am testing it - it died very shortly before that changes, but now it seems it is working - I'll see how long...
Nope :( I was hoping that raising dependencies could fix this problem but after 5 hrs it also died :(
Are we able to bisect it somehow? (I know it would be a pain).... or at least review a git log of the async stuff... We have the "bad" but what about the good one? This is maybe not a rule but I observed during my development, that adding more async tasks - the better chance to get the "freeze" sooner then later...
Ok let's summarize what we know / don't know
What we know
What we don't know
cc @MabezDev @jessebraham any ideas or did you ever experience problems with async in general
I setup an ESP32-C3 with the modified embassy_dhcp
example - let us see if it runs into the same problem
Seems the C3 also died - seems it lasted slightly longer for me but still just roughly two hours
This probably isn't specific to esp-wifi. I've experienced random lockups before I added wifi to my project, but in my case my MCU is only ever running for a few minutes at a time. My last test ran for about 20 minutes before I stopped it, I can probably set up something longer term, too.
In my case, the problem seemed to go away by itself, but maybe it really didn't.
This probably isn't specific to esp-wifi. I've experienced random lockups before I added wifi to my project, but in my case my MCU is only ever running for a few minutes at a time. My last test ran for about 20 minutes before I stopped it, I can probably set up something longer term, too.
In my case, the problem seemed to go away by itself, but maybe it really didn't.
Interesting! I tried running an async blinky with some additional tasks also just using timers but maybe that was too simple and worked for a few hours before I needed the dev-board for something else. If you find a way to repro a similar thing without esp-wifi
that would be very interesting.
an async blinky
My old hypothesis was that the firmware might be missing interrupts in some currently unexplained cases, so to me it's unlikely for a blinky to lock up. My use case is my ecg board on a charger, measuring noise at 1kHz and displaying it at 100Hz, which is probably reasonably hungry and it is also obvious at a glance if it locks up. I will report back if the thing locks up, or if I have to stop because it catches fire.
One more thing: you'll want to use an embassy-executor that includes this PR on Xtensa: https://github.com/embassy-rs/embassy/pull/1451 The issue it points to have been repurposed since originally opening it, but it is a fix for a potential deadlock I was hitting quite often.
It doesn't explain why the C3 dies, though.
I have stopped my test after 16 hours. Either the executor I'm using isn't prone to the lockup in this issue, or it really is specific to esp-wifi. While the issue with C3 seems to suggest the latter, it would still be awesome if you could retry an Xtensa reproducer with embassy-executor = "0.2.1"
, as it fixes a bug that may cause firmware to randomly freeze.
On Thu, Aug 10, 2023 at 09:05:19PM -0700, Dániel Buga wrote:
I have stopped my test after 16 hours. Either the executor I'm using isn't prone to the lockup in this issue, or it really is specific to esp-wifi. While the issue with C3 seems to suggest the latter, it would still be awesome if you could retry an Xtensa reproducer with embassy-executor = "0.2.1", as it fixes a bug that may cause firmware to randomly freeze. I thing I could try this later as I'm working in office today...
Ok guys, I just want to put an update here:
Till now I was testing the blocking dhcp example to make sure if it really don't freeze at some point. And indeed it looks it is working perfectly fine. My ping stats are:
--- esp ping statistics ---
195443 packets transmitted, 195412 received, 0.0158614% packet loss, time 195565305ms
rtt min/avg/max/mdev = 3.750/14.138/356.586/4.744 ms
Which means it was running fine for over 2 days and 6 hours until I stopped it now.
I was trying to run the embassy_dhcp
example with embassy-executor
set to 0.2.1
as @bugadani suggests, but cargo screams at me about dependency problem (probably with some other crates, as it's stick to specific version in the main esp-wifi
repository); so I just upgraded it in my own project (which has more recent git revisions of dependent crates) and It compiled fine.
So now I am testing my program based on embassy_dhcp
+ additional task which is reading/writing the UART...
Hi, It look like you tracked this problem down! :) My ESP32 is still working, it is about 14h now. I think this indeed was a problem. Thank you very much, guys! I'm ok if you close this.
@bugadani Just curious: the https://github.com/embassy-rs/embassy/pull/1451 was for a regression or this problem was there from the beginning?
The problem was there from the start, I believe. The code was perfectly good for single-core MCUs, but some of the particulars of a dual-core (like the original ESP32, or the S3) weren't considered for reasons that are probably lost in time at this point. This isn't exactly unexpected. What is somewhat surprising to me, though, is that it took almost a year to recognise and diagnose it. (And another 3 months to release the fix, but that's mostly my forgetfulness 🙈)
Upgrading embassy-executor to v0.2.1 fixed the instability for me. @bugadani, thanks for pointing out the solution!
Hi! I am currently testing my DevKitC ESP-WROOM-32U. After some consideration I've chosen the no-std and esp-wifi with the embassy. Generally speaking my project is compiling and "quite" working. My main problem is stability. Sometimes it is working for several minutes, sometimes even some hours. I even enabled the watchdog - and it is resetting correctly when the problem occurs but this is rather a workaround for the problem.
As a last resort I just compiled and flashed a pure esp-wifi example (
embassy_dhcp
) to eliminate the factor of something wrong with my code. The result is unfortunately the same - the device is working for several hours max. Then it stops working (I believe that the embassy async tasks just stop working) - so the ping reply also stops. The chip is not hot (but It has to work with full CPU speed because of wifi).I understand that there are some glitches/instability issues at this stage of adoption.
I'll be glad if you can confirm/deny this problem on your environment/devices - in other words: I am just wondering if this is only affecting me (or my device) - or this is rather some general problem which will hopefully be fixed in the future versions...
btw: when the program is starting I've got:
I hope this is OK. Once during my testing I have this line in the log:
INFO - Unhandled event: StaBeaconTimeout
When starting I've got:
Steps to reproduce:
cargo run --example embassy_dhcp --release --features "async,embedded-svc,wifi,embassy-net"
Note: I disabled the TCP connection to remote server like this:/*let mut socket = TcpSocket::new(&stack, &mut rx_buffer, &mut tx_buffer);
@@ -184,5 +186,6 @@ async fn task(stack: &'static Stack<WifiDevice<'static>>) { println!("{}", core::str::from_utf8(&buf[..n]).unwrap()); } Timer::after(Duration::from_millis(3000)).await;
--- esp ping statistics --- 13138 packets transmitted, 7042 received, +6041 errors, 46.3998% packet loss, time 13181183ms rtt min/avg/max/mdev = 5.369/14.596/53.583/3.680 ms, pipe 4