Closed sglvladi closed 7 years ago
Just a quick update.
I managed to identify that the issue occurred when radio.sendACK();
was run, and more specifically within receiveDone();
in RFM69.cpp line 267.
Following this, I added a printout just after the noInterrupts();
in RFM69.cpp line 267, which somehow seemed to fix the issue. Later I replaced the printout with a simple delay(10)
and the ESP32 gateway has been running with no issues since then.
Not sure why adding a delay there would fix the issue, but it seems to do the job. If anyone can add some reasoning behind this result it would be great!
Hi, I have a similar issue with RFM69 running on a WeMos LOLIN32. Most of the time the error is a ESP watchdog timeout, frequency is random (hours, minutes,...):
rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
ets Jun 8 2016 00:22:57
rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0010,len:4
load:0x3fff0014,len:588
load:0x40078000,len:0
load:0x40078000,len:10472
entry 0x40078a28
and very rarely a system crash.
Now this issue occurs effectively in the radio.senACK
function.
Adding a delay of 10ms doesn't actually help, it just kills the ACK reply, which is equivalent to not send an ACK, and therefore looks as a solution.
To protect against watchdog time out, delay(0)
or yield()
should be used, however I couldn't find where to place it. I believe this is related to the Interrupt handling that takes randomly too much time.
I am running a WeMos D1 mini (ESP8266) + RFM69 shield gateway since months and I see that once per day the system is restarting. So I hope that finding a solution to this issue will also solve my ESP8366 one.
I will come back if I find a solution. Robert
Robert,
Thanks for your input. Indeed the suggestion by @sglvladi to add delay(10)
would introduce a bug not a fix. My primary concern is for the AVR platform, but I am open to adding a fix once someone is able to find one and demonstrate it is a fix and not a band aid.
Felix, I totally agree that this issue is not related to Moteino and probably not on RFM69 running on AVR in general. ESPx processors are running several processes in background which are protected by watchdog timers. It looks to be the same kind of issue that we have with Ethernet controller while dealing with asynchronous SPI interrupts concurrent to the RFM ones. However it looks more complex than I thought in the beginning. Until now I have a way to greatly improve it but not to solve it totally,(at least it requires several hours /days to be sure that the solution is efficient). So I continue "slowly" to patch the library to find what is the actual solution. Robert
Thank you Robert, I appreciate your care and effort.
@rrobinet and @LowPowerLab, thanks for taking an interest into this.
Just as a quick note, I didn't suggest that adding the delay is a valid solution, but rather just stated my findings in case it helps identify the real issue.
Another thing I should add is that I have tried adding yield()
commands wherever there looks to be a lengthy while
loop which could cause a wdt, but this does not seem to help.
@rrobinet could you possibly share your fix, even if it doesn't eliminate the problem? From what you mentioned I am not sure whether the fix refers to the ESP8266 or ESP32, but it would be helpful in any case.
Thanks once again.
Well, after spending 2 days trying to fix this issue, I still have no real solution.
With simple test WeMos Lolin struct_recieve
sketch and Moteino'sstruct_send
, I still have random watchdog time-out (it works well during several minutes and suddenly 3 or 4 timeout in a row).
I looks that this issue appears only during transmit (ACK and Send with retries).
I didn't test with a Struct_send
yet but I expect the same issue.
I am afraid that a solution should come only from the ESP32 team. I put a entry in the ESP forum
see (https://www.esp32.com/viewtopic.php?f=19&t=2971) without reply yet...
We are probably the first trying RFM69 on WeMos ESP32, so other will complain about the same issue in the future
I did all some tests with the RFM69 library patched for SPI TRANSACTION, but without better results.
Because this is not an RFM69 Library on AVR problem, I think that this issue is to be continued in the ESP32 forum or concurrently in the Moteino one but should be closed here (up to Felix to decide).
I note that the same test with a Wemos D1 min or Pro (ESP8266) works perfectly, so it is definitively an ES32 issue. I will continue to follow this issue and comeback if there is a solution. Robert
@rrobinet Reading the issue you posted in the ESP32 repo, I can see that you have since exchanged some comments with me-no-dev. In regards to his response, what is the "interrupt handler" that was mentioned? Does it refer to the library's interruptHandler()
function, or something different?
However, I also noticed that the debug print out you are receiving is different to mine and, since we both seem to have added yield()
commands here and there, I think it would be more appropriate to create a separate issue.
In any case, it would be great if @LowPowerLab could have a look at the response you received for your issue and give us his thoughts. I am referring to the "I would say the issue is in the interrupt handler. It's doing way too many things." and "Lib needs to be adapted to work on ESP32" comments you received from me-no-dev.
@sglvladi
To evaluate the WeMos LOLIN32 (ESP32 WROOM 4MB flash, Bluetooth and Wifi + Battery bup) I am porting my Home Automation gateway from a Wemos D1 mini + RFM69 shield + Wifi + MQTT.
I had/have several issues (more or less resolved), the current one is regular reset for time-out (no crash).
As you saw, I have an open issue https://github.com/espressif/arduino-esp32/issues/624, and not too much success on https://www.esp32.com/viewtopic.php?f=19&t=2971)
As explained and also tested, the Yield command has no effect on ESP32, so this is not the solution.
The current hint is to try to reduce the processing time of the InterruptHandler, not a piece of cake I believe.
I am a little busy for the time being and I have less time to test it for the weeks to come, anyway if this is the solution it will be a major change for the RFM library.
If your issue is a crash rather than a time-out you should maybe enter an new issue at https://github.com/espressif/arduino-esp32 ... To be continued Robert
@sglvladi Trying an old version of the esp-32 core set for Arduino I have also crashes, which may be can explain yours:
Transmitting at 433 Mhz...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... nothing...
Sending struct (12 bytes) ... Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC : 0x40083aaa PS : 0x00060034 A0 : 0x80085003 A1 : 0x3ffc0590
A2 : 0x3ffc1408 A3 : 0x00060021 A4 : 0x00060c23 A5 : 0x00000020
A6 : 0x00000020 A7 : 0x00060b23 A8 : 0xb33f0001 A9 : 0x00000001
A10 : 0x00060021 A11 : 0x00000000 A12 : 0x00060021 A13 : 0x00000000
A14 : 0xffffffff A15 : 0x3ffc8474 SAR : 0x00000014 EXCCAUSE: 0x00000005
EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000
Backtrace: 0x40083aaa:0x3ffc0590 0x40085003:0x3ffc05b0 0x40083a0a:0x3ffc05d0 0x40085b0c:0x3ffc05f0 0x40081bad:0x3ffc0600
CPU halted.
@rrobinet Yes you are completely right! I upgraded to the latest arduino-esp32 version (currently 0.10.0) yesterday and started getting the same wdt error you have reported. I guess it was the same issue hidden under different debugging between the two versions.
~Just as a note, I have made a simplistic Ticker.h library for the ESP32 (see here) to drive some leds and button interrupts on the gateway, which seems to work pretty well while no messages are received, however as soon as a node starts transmitting, the wdt reset happens much more often than before (like at least once a minute). Again, I assume this is related to the lengthiness of the interruptHandler()
, and more specifically to the relatively high times which are spent between calling noInterrupts()
and interrupts()
or maybeInterrupts()
. I also noticed that the wdt seems to happen more often when the devices are further away, or if I interfere with the antennas. This also supports the above assumption, as the frame sending and receival times are increased, which means that more time is spent in the relevant sections of the code, leading to more frequent resets.~
Without undermining the usefulness of this RFM69 library (i.e. I mean to offence to @LowPowerLab), I think it is a great library for AVRs and ESP8266, but I will start experimenting with the Radiohead library, at least until a fix is found for this one . Even though it does not officially support the ESP32, yet, I did some digging around in their forum and found that someone has already done some work to add ESP32 compatibility (see here). After enquiring about it, he was kind enough to share his fork, which he says has been working fine for him, but from what I can tell has only been tested with an RFM95. I suspect that the contribution he has made falls under the lower level library drivers and thus RFM69 should also work. Even if it "works" though, it would still remain to check whether it suffers from similar (wdt) issues. In any case, I will have a play around and will keep you updated.
UPDATE: I have just seen the last correspondence you have had under espressif/arduino-esp32#624, which I guess means that whatever I said above regarding the interrupts is invalid, and thus has been crossed out. At least it sounds like you have managed to identify the problem. 👍
@sglvladi Yes, it looks that the issue is due to the fact that SPI is handled as an interrupt and therefore may not be included in an interrupt routine. They have proposed me a patch, but this looks making it worst than better. However I will continue this ESP32 issue expecting a workaround making the RFM69 library compatible with the WeMos LOLIN . Note that I will abroad for one week and not able to follow it.
Long time a go I had a look at the Radiohead library that seems very complex for me to understand (too much modules). Also I use a patched version of RFM69 library for secure RFM session (RFM69_Sessionkey). So if I want to use WeMos LOLIN I need a working version of the RFM69 one. See you in one week Robert
@sglvladi
OK I am finalising a RFM69 library where all SPI transfers have be moved from the RFM69X::interruptHandler
() routine to a new one activated during the RFM69X::receiveDone()
. It looks working pretty well.
I have test it on Moteino, Arduino and WEMOS LOLIN but I still have to do some test on WEMOS D1 mini and Arduino with Ethernet controller. Once done I will submit it to you so that you can also test it.
@LowPowerLab
Felix, of course this is a major change of the current library that also required the virtualised library(ies) to be adapted and makes the data processing a little bit slower, so I imagine that this new version will be and stay an exotic one.
Do you agree if I publish it on github?
@rrobinet Yes I agree, this can be kept as a fork dedicated mainly to ESP devices.
@sglvladi and @LowPowerLab I have posted https://github.com/rrobinet/RFM69X_Library and https://github.com/rrobinet/RFM69X_SessionKey-Library updated versions tested with WeMoS LOLIN32, expecting to be compatible with all ESP-32 processors. I use the letter 'X' as the ESP-32 extension The RFM69X is just a modification of the last RFM69 library (july 2014), with the following remarks:
if(!radio.initialize(FREQUENCY,NODEID,NETWORKID))
{
Serial.println ("\n****************************************************************");
Serial.println (" WARNING: RFM Transceiver initialisation failure: Set-up Halted ");
Serial.println ("****************************************************************");
while (1); // Halt the process
}
This version was (not extensively) tested with: Arduino UNO/MEGA, Moteino/MEGA, WeMos LOLIN 32, and WeMOs D1 mini and pro and looks to be OK. I note that the RFM69X_SessionKey on WeMos LOLIN is a little bit slower than the original one
And finally, it is an AS IS version with no guarantee of support. if you agree, I believe this issue may be closed Robert
Thank you @rrobinet , Consider adding a forward compatible license.
I am running a slightly modified version of the Gateway example, having removed the SPIFlash for compatibility reasons.
Here's a snippet:
After a random number of messages I receive the following error:
Decoding the backtrace using EspExceptionDecoder I get the following:
Anyone have a clue why this could be happening?
Thanks much in advance.