LowPowerLab / RFM69

RFM69 library for RFM69W, RFM69HW, RFM69CW, RFM69HCW (semtech SX1231, SX1231H)
GNU General Public License v3.0
782 stars 379 forks source link

ESP32 random crashes #81

Closed sglvladi closed 7 years ago

sglvladi commented 7 years ago

I am running a slightly modified version of the Gateway example, having removed the SPIFlash for compatibility reasons.

Here's a snippet:

// Sample RFM69 receiver/gateway sketch, with ACK and optional encryption, and Automatic Transmission Control
// Passes through any wireless received messages to the serial port & responds to ACKs
// It also looks for an onboard FLASH chip, if present
// **********************************************************************************
// Copyright Felix Rusu 2016, http://www.LowPowerLab.com/contact
// **********************************************************************************
// License
// **********************************************************************************
// This program is free software; you can redistribute it 
// and/or modify it under the terms of the GNU General    
// Public License as published by the Free Software       
// Foundation; either version 3 of the License, or        
// (at your option) any later version.                    
//                                                        
// This program is distributed in the hope that it will   
// be useful, but WITHOUT ANY WARRANTY; without even the  
// implied warranty of MERCHANTABILITY or FITNESS FOR A   
// PARTICULAR PURPOSE. See the GNU General Public        
// License for more details.                              
//                                                        
// Licence can be viewed at                               
// http://www.gnu.org/licenses/gpl-3.0.txt
//
// Please maintain this license information along with authorship
// and copyright notices in any redistribution of this code
// **********************************************************************************
#include <RFM69.h>         //get it here: https://www.github.com/lowpowerlab/rfm69
#include <RFM69_ATC.h>     //get it here: https://www.github.com/lowpowerlab/rfm69
#include <SPI.h>           //included with Arduino IDE install (www.arduino.cc)

#if defined(__AVR_ATmega168__) || defined(__AVR_ATmega328P__) || defined(__AVR_ATmega88) || defined(__AVR_ATmega8__) || defined(__AVR_ATmega88__)
#define RFM69_CS      10
#define RFM69_IRQ     2
#define RFM69_IRQN    digitalPinToInterrupt(RFM69_IRQ)
#define RFM69_RST     9
#define LED           13  // onboard blinky
#elif defined(__arm__)//Use pin 10 or any pin you want
// Tested on Arduino Zero
#define RFM69_CS      10
#define RFM69_IRQ     5
#define RFM69_IRQN    digitalPinToInterrupt(RFM69_IRQ)
#define RFM69_RST     6
#define LED           13  // onboard blinky
#elif defined(ESP8266)
// ESP8266
#define RFM69_CS      15  // GPIO15/HCS/D8
#define RFM69_IRQ     4   // GPIO04/D2
#define RFM69_IRQN    digitalPinToInterrupt(RFM69_IRQ)
#define RFM69_RST     2   // GPIO02/D4
#define LED           0   // GPIO00/D3, onboard blinky for Adafruit Huzzah
#else
#define RFM69_CS      5
#define RFM69_IRQ     16
#define RFM69_IRQN    digitalPinToInterrupt(RFM69_IRQ)
#define RFM69_RST     17
#endif

//*********************************************************************************************
//************ IMPORTANT SETTINGS - YOU MUST CHANGE/CONFIGURE TO FIT YOUR HARDWARE *************
//*********************************************************************************************
#define NODEID        1    //unique for each node on same network
#define NETWORKID     200  //the same on all nodes that talk to each other
//Match frequency to the hardware version of the radio on your Moteino (uncomment one):
//#define FREQUENCY     RF69_433MHZ
#define FREQUENCY     RF69_868MHZ
//#define FREQUENCY     RF69_915MHZ
#define ENCRYPTKEY    "sampleEncryptKey" //exactly the same 16 characters/bytes on all nodes!
#define IS_RFM69HW    //uncomment only for RFM69HW! Leave out if you have RFM69W!
//*********************************************************************************************
//Auto Transmission Control - dials down transmit power to save battery
//Usually you do not need to always transmit at max output power
//By reducing TX power even a little you save a significant amount of battery power
//This setting enables this gateway to work with remote nodes that have ATC enabled to
//dial their power down to only the required level
//#define ENABLE_ATC    //comment out this line to disable AUTO TRANSMISSION CONTROL
//*********************************************************************************************
#define SERIAL_BAUD   115200

#ifdef __AVR_ATmega1284P__
  #define LED           15 // Moteino MEGAs have LEDs on D15
  #define FLASH_SS      23 // and FLASH SS on D23
#else
  #define LED           9 // Moteinos have LEDs on D9
  #define FLASH_SS      8 // and FLASH SS on D8
#endif

#ifdef ENABLE_ATC
  RFM69_ATC radio;
#else
  RFM69 radio;
#endif

bool promiscuousMode = false; //set to 'true' to sniff all packets on the same network

void setup() {
  Serial.begin(SERIAL_BAUD);
  delay(10);
  Serial.println("Here");
  // Initialize radio
  radio = RFM69(RFM69_CS, RFM69_IRQ, true, RFM69_IRQN);
  // Hard Reset the RFM module
  pinMode(RFM69_RST, OUTPUT);
  digitalWrite(RFM69_RST, HIGH);
  delay(100);
  digitalWrite(RFM69_RST, LOW);
  delay(100);

  Serial.println("Here!");
  if (!radio.initialize(FREQUENCY,NODEID,NETWORKID)) {
    Serial.println("radio.initialize failed!");
  }

  Serial.println("Here!!");
  #ifdef IS_RFM69HW
    radio.setHighPower(); //only for RFM69HW!
  #endif

  radio.setPowerLevel(31);
  radio.encrypt(ENCRYPTKEY);
  //radio.promiscuous(promiscuousMode);
  char buff[50];
  sprintf(buff, "\nListening at %d Mhz...", FREQUENCY==RF69_433MHZ ? 433 : FREQUENCY==RF69_868MHZ ? 868 : 915);
  Serial.println(buff);
  Serial.print("Network "); Serial.println(NETWORKID);
  Serial.print("Node "); Serial.println(NODEID);
  Serial.print("Encryptkey "); Serial.println(ENCRYPTKEY);

  Serial.println();

#ifdef ENABLE_ATC
  Serial.println("RFM69_ATC Enabled (Auto Transmission Control)");
#endif
}

byte ackCount=0;
uint32_t packetCount = 0;
void loop() {

  if (radio.receiveDone())
  {
    Serial.print("#[");
    Serial.print(++packetCount);
    Serial.print(']');
    Serial.print('[');Serial.print(radio.SENDERID, DEC);Serial.print("] ");
    if (promiscuousMode)
    {
      Serial.print("to [");Serial.print(radio.TARGETID, DEC);Serial.print("] ");
    }
    for (byte i = 0; i < radio.DATALEN; i++)
      Serial.print((char)radio.DATA[i]);
    Serial.print("   [RX_RSSI:");Serial.print(radio.RSSI);Serial.print("]");

    if (radio.ACKRequested())
    {
      byte theNodeID = radio.SENDERID;
      radio.sendACK();
      Serial.print(" - ACK sent.");

      // When a node requests an ACK, respond to the ACK
      // and also send a packet requesting an ACK (every 3rd one only)
      // This way both TX/RX NODE functions are tested on 1 end at the GATEWAY
      if (ackCount++%3==0)
      {
        Serial.print(" Pinging node ");
        Serial.print(theNodeID);
        Serial.print(" - ACK...");
        delay(3); //need this when sending right after reception .. ?
        if (radio.sendWithRetry(theNodeID, "ACK TEST", 8, 0))  // 0 = only 1 attempt, no retries
          Serial.print("ok!");
        else Serial.print("nothing");
      }
    }
    Serial.println();
  }
}

After a random number of messages I receive the following error:

Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC      : 0x40083537  PS      : 0x00060034  A0      : 0x80084b3b  A1      : 0x3ffc0590  
A2      : 0x3ffc1380  A3      : 0x00060021  A4      : 0x00060e23  A5      : 0x00000020  
A6      : 0x00000020  A7      : 0x00060023  A8      : 0xb33f0000  A9      : 0xb33fffff  
A10     : 0x00060021  A11     : 0x00000000  A12     : 0x00060021  A13     : 0x3ffc7724  
A14     : 0x00000003  A15     : 0x0000000f  SAR     : 0x00000012  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace: 0x40083537:0x3ffc0590 0x40084b3b:0x3ffc05b0 0x400834ce:0x3ffc05d0 0x40085644:0x3ffc05f0 0x40081b09:0x3ffc0600

Decoding the backtrace using EspExceptionDecoder I get the following:

0x40083537: uxPortCompareSet at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/include/freertos/portmacro.h line 239
:  (inlined by) vPortCPUAcquireMutex at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/./port.c line 315
0x40083537: uxPortCompareSet at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/include/freertos/portmacro.h line 239
:  (inlined by) vPortCPUAcquireMutex at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/./port.c line 315
0x40084b3b: xTaskIncrementTick at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/./tasks.c line 4420
0x400834ce: xPortSysTickHandler at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/./port.c line 275 (discriminator 1)
0x40085644: _frxt_timer_int at ?? line ?
0x40081b09: _xt_lowint1 at xtensa_vectors.o line ?

Anyone have a clue why this could be happening?

Thanks much in advance.

sglvladi commented 7 years ago

Just a quick update.

I managed to identify that the issue occurred when radio.sendACK(); was run, and more specifically within receiveDone(); in RFM69.cpp line 267.

Following this, I added a printout just after the noInterrupts(); in RFM69.cpp line 267, which somehow seemed to fix the issue. Later I replaced the printout with a simple delay(10) and the ESP32 gateway has been running with no issues since then.

Not sure why adding a delay there would fix the issue, but it seems to do the job. If anyone can add some reasoning behind this result it would be great!

rrobinet commented 7 years ago

Hi, I have a similar issue with RFM69 running on a WeMos LOLIN32. Most of the time the error is a ESP watchdog timeout, frequency is random (hours, minutes,...):

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
ets Jun  8 2016 00:22:57

rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0010,len:4
load:0x3fff0014,len:588
load:0x40078000,len:0
load:0x40078000,len:10472
entry 0x40078a28

and very rarely a system crash. Now this issue occurs effectively in the radio.senACK function. Adding a delay of 10ms doesn't actually help, it just kills the ACK reply, which is equivalent to not send an ACK, and therefore looks as a solution. To protect against watchdog time out, delay(0) or yield() should be used, however I couldn't find where to place it. I believe this is related to the Interrupt handling that takes randomly too much time.

I am running a WeMos D1 mini (ESP8266) + RFM69 shield gateway since months and I see that once per day the system is restarting. So I hope that finding a solution to this issue will also solve my ESP8366 one.

I will come back if I find a solution. Robert

LowPowerLab commented 7 years ago

Robert, Thanks for your input. Indeed the suggestion by @sglvladi to add delay(10) would introduce a bug not a fix. My primary concern is for the AVR platform, but I am open to adding a fix once someone is able to find one and demonstrate it is a fix and not a band aid.

rrobinet commented 7 years ago

Felix, I totally agree that this issue is not related to Moteino and probably not on RFM69 running on AVR in general. ESPx processors are running several processes in background which are protected by watchdog timers. It looks to be the same kind of issue that we have with Ethernet controller while dealing with asynchronous SPI interrupts concurrent to the RFM ones. However it looks more complex than I thought in the beginning. Until now I have a way to greatly improve it but not to solve it totally,(at least it requires several hours /days to be sure that the solution is efficient). So I continue "slowly" to patch the library to find what is the actual solution. Robert

LowPowerLab commented 7 years ago

Thank you Robert, I appreciate your care and effort.

sglvladi commented 7 years ago

@rrobinet and @LowPowerLab, thanks for taking an interest into this.

Just as a quick note, I didn't suggest that adding the delay is a valid solution, but rather just stated my findings in case it helps identify the real issue.

Another thing I should add is that I have tried adding yield() commands wherever there looks to be a lengthy while loop which could cause a wdt, but this does not seem to help.

@rrobinet could you possibly share your fix, even if it doesn't eliminate the problem? From what you mentioned I am not sure whether the fix refers to the ESP8266 or ESP32, but it would be helpful in any case.

Thanks once again.

rrobinet commented 7 years ago

Well, after spending 2 days trying to fix this issue, I still have no real solution. With simple test WeMos Lolin struct_recieve sketch and Moteino'sstruct_send, I still have random watchdog time-out (it works well during several minutes and suddenly 3 or 4 timeout in a row). I looks that this issue appears only during transmit (ACK and Send with retries). I didn't test with a Struct_send yet but I expect the same issue. I am afraid that a solution should come only from the ESP32 team. I put a entry in the ESP forum see (https://www.esp32.com/viewtopic.php?f=19&t=2971) without reply yet... We are probably the first trying RFM69 on WeMos ESP32, so other will complain about the same issue in the future I did all some tests with the RFM69 library patched for SPI TRANSACTION, but without better results. Because this is not an RFM69 Library on AVR problem, I think that this issue is to be continued in the ESP32 forum or concurrently in the Moteino one but should be closed here (up to Felix to decide).

I note that the same test with a Wemos D1 min or Pro (ESP8266) works perfectly, so it is definitively an ES32 issue. I will continue to follow this issue and comeback if there is a solution. Robert

sglvladi commented 7 years ago

@rrobinet Reading the issue you posted in the ESP32 repo, I can see that you have since exchanged some comments with me-no-dev. In regards to his response, what is the "interrupt handler" that was mentioned? Does it refer to the library's interruptHandler() function, or something different?

However, I also noticed that the debug print out you are receiving is different to mine and, since we both seem to have added yield() commands here and there, I think it would be more appropriate to create a separate issue.

In any case, it would be great if @LowPowerLab could have a look at the response you received for your issue and give us his thoughts. I am referring to the "I would say the issue is in the interrupt handler. It's doing way too many things." and "Lib needs to be adapted to work on ESP32" comments you received from me-no-dev.

rrobinet commented 7 years ago

@sglvladi To evaluate the WeMos LOLIN32 (ESP32 WROOM 4MB flash, Bluetooth and Wifi + Battery bup) I am porting my Home Automation gateway from a Wemos D1 mini + RFM69 shield + Wifi + MQTT. I had/have several issues (more or less resolved), the current one is regular reset for time-out (no crash). As you saw, I have an open issue https://github.com/espressif/arduino-esp32/issues/624, and not too much success on https://www.esp32.com/viewtopic.php?f=19&t=2971) As explained and also tested, the Yield command has no effect on ESP32, so this is not the solution. The current hint is to try to reduce the processing time of the InterruptHandler, not a piece of cake I believe.
I am a little busy for the time being and I have less time to test it for the weeks to come, anyway if this is the solution it will be a major change for the RFM library.

If your issue is a crash rather than a time-out you should maybe enter an new issue at https://github.com/espressif/arduino-esp32 ... To be continued Robert

rrobinet commented 7 years ago

@sglvladi Trying an old version of the esp-32 core set for Arduino I have also crashes, which may be can explain yours:

Transmitting at 433 Mhz...

Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ...  nothing...
Sending struct (12 bytes) ... Guru Meditation Error: Core  0 panic'ed (Interrupt wdt timeout on CPU0)
Register dump:
PC      : 0x40083aaa  PS      : 0x00060034  A0      : 0x80085003  A1      : 0x3ffc0590  
A2      : 0x3ffc1408  A3      : 0x00060021  A4      : 0x00060c23  A5      : 0x00000020  
A6      : 0x00000020  A7      : 0x00060b23  A8      : 0xb33f0001  A9      : 0x00000001  
A10     : 0x00060021  A11     : 0x00000000  A12     : 0x00060021  A13     : 0x00000000  
A14     : 0xffffffff  A15     : 0x3ffc8474  SAR     : 0x00000014  EXCCAUSE: 0x00000005  
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000  

Backtrace: 0x40083aaa:0x3ffc0590 0x40085003:0x3ffc05b0 0x40083a0a:0x3ffc05d0 0x40085b0c:0x3ffc05f0 0x40081bad:0x3ffc0600

CPU halted.
sglvladi commented 7 years ago

@rrobinet Yes you are completely right! I upgraded to the latest arduino-esp32 version (currently 0.10.0) yesterday and started getting the same wdt error you have reported. I guess it was the same issue hidden under different debugging between the two versions.

~Just as a note, I have made a simplistic Ticker.h library for the ESP32 (see here) to drive some leds and button interrupts on the gateway, which seems to work pretty well while no messages are received, however as soon as a node starts transmitting, the wdt reset happens much more often than before (like at least once a minute). Again, I assume this is related to the lengthiness of the interruptHandler(), and more specifically to the relatively high times which are spent between calling noInterrupts() and interrupts() or maybeInterrupts(). I also noticed that the wdt seems to happen more often when the devices are further away, or if I interfere with the antennas. This also supports the above assumption, as the frame sending and receival times are increased, which means that more time is spent in the relevant sections of the code, leading to more frequent resets.~

Without undermining the usefulness of this RFM69 library (i.e. I mean to offence to @LowPowerLab), I think it is a great library for AVRs and ESP8266, but I will start experimenting with the Radiohead library, at least until a fix is found for this one . Even though it does not officially support the ESP32, yet, I did some digging around in their forum and found that someone has already done some work to add ESP32 compatibility (see here). After enquiring about it, he was kind enough to share his fork, which he says has been working fine for him, but from what I can tell has only been tested with an RFM95. I suspect that the contribution he has made falls under the lower level library drivers and thus RFM69 should also work. Even if it "works" though, it would still remain to check whether it suffers from similar (wdt) issues. In any case, I will have a play around and will keep you updated.

UPDATE: I have just seen the last correspondence you have had under espressif/arduino-esp32#624, which I guess means that whatever I said above regarding the interrupts is invalid, and thus has been crossed out. At least it sounds like you have managed to identify the problem. 👍

rrobinet commented 7 years ago

@sglvladi Yes, it looks that the issue is due to the fact that SPI is handled as an interrupt and therefore may not be included in an interrupt routine. They have proposed me a patch, but this looks making it worst than better. However I will continue this ESP32 issue expecting a workaround making the RFM69 library compatible with the WeMos LOLIN . Note that I will abroad for one week and not able to follow it.

Long time a go I had a look at the Radiohead library that seems very complex for me to understand (too much modules). Also I use a patched version of RFM69 library for secure RFM session (RFM69_Sessionkey). So if I want to use WeMos LOLIN I need a working version of the RFM69 one. See you in one week Robert

rrobinet commented 7 years ago

@sglvladi OK I am finalising a RFM69 library where all SPI transfers have be moved from the RFM69X::interruptHandler() routine to a new one activated during the RFM69X::receiveDone(). It looks working pretty well. I have test it on Moteino, Arduino and WEMOS LOLIN but I still have to do some test on WEMOS D1 mini and Arduino with Ethernet controller. Once done I will submit it to you so that you can also test it. @LowPowerLab Felix, of course this is a major change of the current library that also required the virtualised library(ies) to be adapted and makes the data processing a little bit slower, so I imagine that this new version will be and stay an exotic one. Do you agree if I publish it on github?

LowPowerLab commented 7 years ago

@rrobinet Yes I agree, this can be kept as a fork dedicated mainly to ESP devices.

rrobinet commented 7 years ago

@sglvladi and @LowPowerLab I have posted https://github.com/rrobinet/RFM69X_Library and https://github.com/rrobinet/RFM69X_SessionKey-Library updated versions tested with WeMoS LOLIN32, expecting to be compatible with all ESP-32 processors. I use the letter 'X' as the ESP-32 extension The RFM69X is just a modification of the last RFM69 library (july 2014), with the following remarks:

  1. It doesn't work with the RFM69_ATC virtualised library which if necessary should be adapted to cope with the new interruptHandler / interruptHandling.
  2. It also uses the standard SPI_HAS_TRANSACTION syntaxes according to the new SPI library
  3. It works with Ethernet W5100 shield without specific changes because SPI transactions during receive data are not handled by the interrupt routine anymore
  4. One extra test is done while starting the RFM instance to see if the RFM transceiver is present, this can be tested through a conditional start up, typically:
    if(!radio.initialize(FREQUENCY,NODEID,NETWORKID))
    {
    Serial.println ("\n****************************************************************");
    Serial.println (" WARNING: RFM Transceiver initialisation failure: Set-up Halted  ");
    Serial.println ("****************************************************************"); 
    while (1); // Halt the process
    }
  5. The virtualised RfM69X_SessionKey library should be used for secure data exchanges using one time keys for each exchanges
  6. Package includes two examples to Send and Receive data. These examples automatically detects the processor type and allow different options to activate the different libraries for test purposes.

This version was (not extensively) tested with: Arduino UNO/MEGA, Moteino/MEGA, WeMos LOLIN 32, and WeMOs D1 mini and pro and looks to be OK. I note that the RFM69X_SessionKey on WeMos LOLIN is a little bit slower than the original one

And finally, it is an AS IS version with no guarantee of support. if you agree, I believe this issue may be closed Robert

LowPowerLab commented 7 years ago

Thank you @rrobinet , Consider adding a forward compatible license.