esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.98k stars 13.34k forks source link

SoftwareSerial: wdt reset #1426

Closed supersjimmie closed 5 years ago

supersjimmie commented 8 years ago

I use both SoftwareSerial and WiFiClient. During normal operations, it sometimes works for many hours but other times it gives a wdt reset. This looks like it has a relation between SoftwareSerial and WiFiClient.

If I use mySerial.enableRX(false) before starting the function with the WiFiClient, it seems to keep working, otherwise it looks like to crash on different places.

ets Jan 8 2013,rst cause:4, boot mode:(3,6) wdt reset This was during client.print() operations, the first client.print with the data succeeded, then just after that, only when sending an empty line with client.print("\r\n") it gave the wdt reset.

But earlier, it was just during receiving data on the SoftwareSerial: `ets Jan 8 2013,rst cause:4, boot mode:(1,6)``

And a couple of other times also during normal receiving on the serial: ets Jan 8 2013,rst cause:4, boot mode:(3,7)

So many different boot modes, they all seem only to occur if I don't use the mySerial.enableRX(false).

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

supersjimmie commented 8 years ago

Ah, too bad, also when I use enableRX(true) ets Jan 8 2013,rst cause:4, boot mode:(1,6) wdt reset

I think I can better leave the github version for now, 2.0.0 is more stable on this.

plerup commented 8 years ago

Please supply some code and information about baud rate and traffic volume

supersjimmie commented 8 years ago

After more investigation, the title can be changed to "SoftwareSerial wdt reset".

#include <SoftwareSerial.h>
SoftwareSerial mySerial(D2, -1, true, 128);

char telegram[64];

int readTelegram();

void setup()
{
  Serial.begin(115200);  // 115200  256000
  mySerial.begin(115200);
}

void loop() {
  int t = readTelegram();
  yield();
}

int readTelegram() {
  if (mySerial.available() > 0) {
    int len = 0;
    char c;
    unsigned long timeout = 2;
    unsigned long start = millis();
    while ((len < 63) && (millis() - start < timeout)) {
      c = mySerial.read();
      if  ((c == '\n') || (c == '\r')) break;
      if ((c >= 32) && (c <= 126)) {
        telegram[len++] = c;
      }
      yield();
    }
    telegram[len] = 0;
    return len;
  }
  else return 0;
}

Pin D2 is connected to my "Smart Meter", which generates a 27 lines, all terminated with \r\n. Each line is max 47 char long (shortest is 5 char, longest 47 char). It is at 115200 baud, with inverted signal. The lines are sent every 10 sec.

Sometimes there is some garbage, and sometimes no \r\n is received, so I have a timeout (2mSec) and a filter (c = 32-127).

Earlier I tried ReadBytsUntil, it also crashed. So I now tried to write it with just read() and my own filter and timeout function.

Also tried with a shorter timeout, using micros() and a value of 500.

After running for a couple of minutes, mostly less then an hour, it crashes with a WDT reset.

plerup commented 8 years ago

This is a rather large burst of interrupts. Try using something like the code below which intends to read all the lines into a buffer and then treat the lines in the buffer afterwards

  #include <SoftwareSerial.h>

  #define MAX_SIZE 27*47
  SoftwareSerial mySerial(D2, -1, true, MAX_SIZE);

  char buffer[MAX_SIZE];

  void setup()
  {
    Serial.begin(115200);  // 115200  256000
    mySerial.begin(115200);
    Serial.println("Started");
  }

  void loop() {
     int pos = 0;
     while (mySerial.available() && pos < MAX_SIZE) {
        buffer[pos++] = mySerial.read();
     }
     if (pos) {
        // Data received, treat the buffer contents
        Serial.println(pos);
     }
  }
plerup commented 8 years ago

BTW, which version are you using?

supersjimmie commented 8 years ago

I think when I read all lines into one large buffer, it will still going to crash after some time. It looks like series of lines has 3 different time gaps: Very short gap between each line, a bit short gap just before the last line (which contains a checksum, but that does not matter) and then the long 10 sec gap. I cannot exactly predict when data comes, so there will always be interrupts on unpredicted moments.

BTW: The WDT resets occur with Arduino IDE 1.6.7 and esp8266 latest GIT. Last reset with exactly the code above: rst cause: 4, boot mode:(1,7)

When I use DE 1.6.5 with esp8266 2.0.0 Stable, it runs fine for as long as I let it run. So, why are the interrupts a problem with the latest, while it is not with the older?

(I think, not for sure, I started with softwareserial 2.1.1 or 2.2)

supersjimmie commented 8 years ago

I have changed the code, to get the whole telegram before continuing:

#include <SoftwareSerial.h>
SoftwareSerial mySerial(D2, -1, true, 128);

#define MAX_LINES 28
#define MAX_LENGTH 64
char buffer[MAX_LINES * MAX_LENGTH];

int readTelegram();

void setup()
{
  Serial.begin(115200);
  mySerial.begin(115200);
}

void loop() {
  int t = readTelegram();
}

int readTelegram() {
  if (mySerial.available() > 0) {
    memset(buffer, 0, sizeof(buffer));
    int len = 0;
    char c;
    unsigned long timeout = 700;
    unsigned long start = millis();
    while ((len < MAX_LINES * MAX_LENGTH) && (millis() - start < timeout)) {
      if (mySerial.available()) {
        c = mySerial.read();
        yield();
        buffer[len++] = c;
        Serial.write(c);
      }
      yield();
    }
    return len;
  }
  return 0;
}

No luck after a while:

 ets Jan  8 2013,rst cause:4, boot mode:(1,7)
wdt reset

Second attempt, boot mode:(1,6)

supersjimmie commented 8 years ago

Having the resets with: Arduino IDE 1.6.5 and 1.6.7. ESP-Arduino libraries 2.0.0 Stable and Latest GIT SoftwareSerial 2.2.

Works with SoftwareSerial 2.1 and all IDE, and all ESP libs. (but with 2.1 I have too much corrupted received data)

supersjimmie commented 8 years ago

@plerup I also tried your code example from 2 days ago. It was running for over an hour without a reset. (perhaps I should find time to let it run longer) But... As soon as I made one simple change:

    buffer[pos++] = mySerial.read();

To:

    buffer[pos] = mySerial.read();
    Serial.print(buffer[pos]);
    pos++;

WDT reset within a couple of minutes.

EDIT: Noo, too bad. The original code from you just crashed too with a wdt reset.

 #include <SoftwareSerial.h>

  #define MAX_SIZE 29*50
  SoftwareSerial mySerial(D2, -1, true, MAX_SIZE);

  char buffer[MAX_SIZE];

  void setup()
  {
    Serial.begin(115200);  // 115200  256000
    mySerial.begin(115200);
    Serial.println("Started");
  }

  void loop() {
     int pos = 0;
     while (mySerial.available() && pos < MAX_SIZE) {
        buffer[pos++] = mySerial.read();
     }
     if (pos) {
        // Data received, treat the buffer contents
        Serial.println(pos);
     }
  }
supersjimmie commented 8 years ago

So today I tried it with a hardware intverter and set the "inverted" flag to "false" in softwareserial. But I still get the WDT resets.

odilonafonso commented 8 years ago

Hi, What SoftwareSerial did you used ? I do not found this constructor:

SoftwareSerial mySerial(D2, -1, true, MAX_SIZE);

Into the SoftwareSerial that I use. The SoftwareSerial I have has this constructor:

SoftwareSerial::SoftwareSerial(uint8_t receivePin, uint8_t transmitPin, bool inverse_logic /* = false */) : 
  _rx_delay_centering(0),
  _rx_delay_intrabit(0),
  _rx_delay_stopbit(0),
  _tx_delay(0),
  _buffer_overflow(false),
  _inverse_logic(inverse_logic)
{
  setTX(transmitPin);
  setRX(receivePin);
}

I have had problems using SoftwareSerial library. Apparently the buffer it uses is not enough to communicate with the ESP8266 .

What is the fourth argument used in this builder - MAX_SIZE ?

supersjimmie commented 8 years ago

I use the latest mentioned here: https://github.com/esp8266/Arduino/blob/master/doc/libraries.md#softwareserial

SoftwareSerial(int receivePin, int transmitPin, bool inverse_logic = false, unsigned int buffSize = 64);
odilonafonso commented 8 years ago

Thanks. One help more: I found SoftwareSerial.h and SoftwareSerial.cpp on: "/opt/arduino-1.6.7/hardware/arduino/avr/libraries/SoftwareSerial" (I use Linux) I have to delete it? and put the new there? or I can put the new library on my local libraries directory - /home/odilon/Arduino/libraries. What is a file search order include the Arduino IDE is?

plerup commented 8 years ago

@odilonafonso The SoftwareSerial you are referring to is the AVR version. If you are using the staging release of esp8266/arduino you will automatically get the correct version

plerup commented 8 years ago

@supersjimmie The only reason I can think of for your problem is if there are a lot of data coming in a burst or if there are some noise on the GPIO pin. In those cases there will be constant calls to the interrupt routine and yielding will not occur before the watchdog resets the chip. You could try disabling the watchdog but then your WiFi will probably be flaky.

supersjimmie commented 8 years ago

I don't think it is noise, but you may be right that it has something to do with the amount of data. If it was noise, the data would be corrupted but it is very clear data.

As far as I understand now, the crashes occur if I do just any small other thing while there is data coming in. Even just printing the received character makes it all instable. For now it looks stable when I only capture data and store it in a buffer, without anything else like checking (is it a valid character) or printing it.

int readTelegram() {
  int len = 0;
  char c;
  unsigned long maxwait = 10000;
  unsigned long maxreceive = 650;
  memset(buffer, 0, sizeof(buffer));

  // Allow receiving data
  Serial.println("waiting...");
  mySerial.enableRx(true);

  // Wait max 10 sec for data
  unsigned long startwait = millis();
  while ((mySerial.available() == 0) && (millis() - startwait < maxwait)) {
    yield();
    ESP.wdtFeed();
  }

  // If data available, get entire telegram
  if (mySerial.available() > 0) {
    unsigned long startreceive = millis();
    while ((len < MAX_LINES * MAX_LENGTH) && (millis() - startreceive < maxreceive)) {
      if (mySerial.available() > 0) {
        buffer[len++] = mySerial.read();
      }
    }
  }

  // Stop receiving data
  mySerial.enableRx(false);

  // If any data has been received, show it
  return len;
}

I also noticed that I had to use enableRX(false) to avoid the interrupts when my code it doing something else (like sending data the the logging server) while new data is coming in on the line.

pieman64 commented 8 years ago

@supersjimmie if you are using SoftwareSerial (and your reference to D2 rather than GPIO X) does that mean you are using an ESP with an Arduino rather than the ESP as an Arduino? The reason I ask is that I use Arduino with ESP / USB. Obviously USB and ESP are completely different but almost identical sketches crash regularly with the ESP whereas they are fine with the USB. Maybe I have a similar problem to you.

supersjimmie commented 8 years ago

No I only use an ESP8266. D2 is because on a ESP8266 NodeMCU dev board the pins are labeled ad D0..D8, which is then translated to normal GPIO numbers. (D2 is GPIO4)

Here to be precise: https://github.com/esp8266/Arduino/blob/master/variants/nodemcu/pins_arduino.h#L37-L59

pieman64 commented 8 years ago

So neither Arduino with ESP or ESP Standalone, more a hybrid the NodeMCU?

supersjimmie commented 8 years ago

NodeMCU is just an ESP12 on a board with some extra features. https://www.google.nl/search?q=nodemcu So a standalone ESP.

supersjimmie commented 8 years ago

@plerup disabling the WDT before this function and re-enabling it afterwards does not solve it, it even seems worse. The only work-around I could find is to do absolutely nothing during the time that any data comes into the SoftwareSerial buffer. Which is very hard to do, because I cannot predict when data can be expected to come. So for now all I can do is just create a waiting loop in my program and do nothing else than waiting for data. This makes the rest of all my code stop for a long time. During receiving I can still not do anything else than only putting it into a buffer.

ps-nl commented 8 years ago

Same problem occurs here, though that may be explained given that I run a modified version of supersjimmie's code. Reboots mostly with error "rst cause:4, boot mode:(3,6) wdt reset".

My environment is the latest github release of the esp8266 framework and espsoftwareserial 2.2, compiled with the platformio IDE environment. The board I use is an Wemos D1 Mini.

rwkiii commented 8 years ago

@supersjimmie - maybe you've already ensured power, but don't take the power requirements lightly. I was able to perform firmware updates and program my ESPs without problem. Running them is a different story! :-P

I cannot stress enough, make sure you have plenty of power to your ESP. I don't feel an Arduino can supply the needed power. I also order 5v - 3.3v converters so I could use the Arduino's power connectors. Very hit and miss. Sometimes it worked, other times it fails and that tends to lead to experimentation with the sketch or pin connections.

Of course, ESPs are rated at 2.7v to 3.3v or something like that, but they draw upwards of 250ma. The Arduino cannot supply that. Also, many FTDI adapters only pull 100ma from the USB conn. Some pull 500ma. I forget what the determination is there...

I ordered some power supply units for my breadboards. Several types that I wanted to try out. They would not all work - I received the same wdt resets you are getting. Replace the power supply with an a different one and it works fine. A sure fire way for me was with a regular 9-volt battery.

After I realized how power-hungry these ESPs are I've been making sure my power supply is always adequate. I can't tell you how much more reliable things have been in the regards to unwanted resets and non-responsive ESPs.

Despite this, I still have to make 3, 4, or even 5 attempts to get a sketch uploaded to them. They work fine - but they really are buggy. Keep that in mind. The wdt reset problem though may be due to power.

Just a thought!

supersjimmie commented 8 years ago

@rwkiii Yes I know about power.
I have no other problems whatsoever that might make power even a bit suspected. I have all double checked and much more power and capacitors than ever recommended. I can exactly isolate the problem to what I am telling about the code, any other (much larger) function runs normally. Even wifi (power) hungry functions and reading/writing other devices are doing fine. It is already very clear the problem really only occurs at exactly the point that I am describing. Also somebody else is having exactly the same problem with a completely different type of board and power etc.

The ESP can run fine for hours and even multipe days when I work-around the problem and it will fail within minutes when I use the problematic code. There is much more code running on it, it just files here and nowhere else.

plerup commented 8 years ago

Yes, let's to bring this thread on track. These are the facts:

SoftwareSerial is depending on interrupts on the Rx pin. If these are coming in constantly e.g. during a burst of many characters, all cpu time will be spent in the interrupt routine. Subsequently the hardware watchdog will not be feed and hence will reset the ESP once the its max time has elapsed. To my knowledge there is no way to disable the hardware watchdog, only the software one.

The only solution I can think of would be to let SoftwareSerial disable the interrupts after a certain time, or when the buffer gets full. Either way characters will be lost.

If anyone has any better solution feel free to send pull requests.

I guess we all have to look forward to the ESP32 where all WiFi and BT communication will be handled in a separate cpu.

supersjimmie commented 8 years ago

Is this really such a big burst then? It is a maximum of 700 chars and about halfway there is a slight delay (maybe a few mSec) and another longer delay just before the last 6-8 chars (a checksum). So it is:

  1. About 350 chars,
  2. a short delay,
  3. again about 350 chars,
  4. a longer delay,
  5. 6-8 chars.

350 chars at 115k2 would be just about 3mSec, that's not really long? Including all delays, the total is always done within 650mSec (maybe even a bit less). Is there a way I can do some testing with disabling the interrupts when the buffer gets full? I think that would be easy to test.

But what's strange, or may bring a solution, it seemed to be working with the Stable 2.0.0 core esp8266/arduino. But I'm not absolutely sure about that (must think about a thorough easy way to test that again).

igrr commented 8 years ago

Note that the HW watchdog is set to trigger in 6 seconds by default, so these watchdog resets you are seeing are not directly caused by a long chain of interrupts. Looks more like some lock-up, i.e. an endless loop within an interrupt.

supersjimmie commented 8 years ago

As you can see in the crashing examples above, I don't think there is a problematic loop? Perhaps one is somewhere in the softwareserial itself?

plerup commented 8 years ago

Can you please try your case with the 2.0.0 version of esp8266/arduino and SoftwareSerial 2.2 if you think there is a difference.

I have never seen this behavior in my own usage of the library.

I had a problem before with interrupts being called constantly until the complete gpio interrupt vector was reset and thereby causing a loop. But that is done in the current version.

juancgalvez commented 8 years ago

First I have to thanks Peter for his development. This software was very useful to me.

I downloaded the latest version from github, and after having some issues, have found three issues with this Software Serial that I am going to explain.

First, the timing is based on CCONT cycle counter (I mean it uses "ESP.getCpuFreqMHz()") for timing, but doesn't consider it overflows around every 53 second. If the counter overflows inside the WAIT macro then it is going to do a loop for 53 seconds making the watch dog reset the system. To fix it replace line 113 which contains:

#define WAIT { while (ESP.getCycleCount()-start < wait); wait += m_bitTime; }

with:

#define WAIT { \
  unsigned long cc = ESP.getCycleCount(); \
  while (ESP.getCycleCount()-start < wait) \
  { \
    if (ESP.getCycleCount() < cc) { \
      start += m_bitTime; \
      wait += m_bitTime; \
    } \
  } \
  wait += m_bitTime; \
}

Second, asynchronous serial communication should be NRZ (Non return to Zero). TX pin is not being set HIGH at the beginning but just before transmitting data. This causes the first byte to fail the async serial protocol. The pin must be set HIGH at the beginning. To do that what I did was to add, after line 62 which contains:

pinMode(m_txPin, OUTPUT);

the line:

digitalWrite(m_txPin, HIGH);

Third, when reading a byte the sampling is being to the edge of the beginning of a bit. In my case, the device transmitting data at 9600 bps just used 89 microseconds for HIGH and 125 microseconds for LOW instead of 102 for both making reading inconsistent. Not an error is software serial library but in the device I was using. To solve it I created a half bit time delay just before starting to read bits. This should work for any speed. The code changes were:

I replaces lines 149 , 150 and 151 which contained:

   unsigned long wait = m_bitTime;
   unsigned long start = ESP.getCycleCount();
   uint8_t rec = 0;

with

   unsigned long wait = m_bitTime / 2;
   unsigned long start = ESP.getCycleCount();
   WAIT;  // wait to be in the middle of the bit time
   uint8_t rec = 0;
   wait = m_bitTime;
   start = ESP.getCycleCount();

I tested changes at 9600 bps and everything worked fine.

I hope this "fixes" work and are useful to anyone.

This is the complete SoftwareSerial.cpp code after the changes I made:

/*

SoftwareSerial.cpp - Implementation of the Arduino software serial for ESP8266.
Copyright (c) 2015 Peter Lerup. All rights reserved.

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

*/

#include <Arduino.h>

// The Arduino standard GPIO routines are not enough,
// must use some from the Espressif SDK as well
extern "C" {
#include "gpio.h"
}

#include <SoftwareSerial.h>

#define MAX_PIN 15

// List of SoftSerial object for each possible Rx pin
SoftwareSerial *InterruptList[MAX_PIN+1];
bool InterruptsEnabled = false;

SoftwareSerial::SoftwareSerial(int receivePin, int transmitPin, bool inverse_logic, unsigned int buffSize) {
   m_rxValid = m_txValid = false;
   m_buffer = NULL;
   m_invert = inverse_logic;
   if (isValidGPIOpin(receivePin)) {
      m_rxPin = receivePin;
      m_buffSize = buffSize;
      m_buffer = (uint8_t*)malloc(m_buffSize);
      if (m_buffer != NULL) {
         m_rxValid = true;
         m_inPos = m_outPos = 0;
         pinMode(m_rxPin, INPUT);
         if (!InterruptsEnabled) {
            ETS_GPIO_INTR_ATTACH(handle_interrupt, 0);
            InterruptsEnabled = true;
         }
         InterruptList[m_rxPin] = this;
         GPIO_REG_WRITE(GPIO_STATUS_W1TC_ADDRESS, BIT(m_rxPin));
         enableRx(true);
      }
   }
   if (isValidGPIOpin(transmitPin)) {
      m_txValid = true;
      m_txPin = transmitPin;
      pinMode(m_txPin, OUTPUT);
      digitalWrite(m_txPin, HIGH);
   }
   // Default speed
   begin(9600);
}

SoftwareSerial::~SoftwareSerial() {
   enableRx(false);
   if (m_rxValid)
      InterruptList[m_rxPin] = NULL;
   if (m_buffer)
      free(m_buffer);
}

bool SoftwareSerial::isValidGPIOpin(int pin) {
   // Some GPIO pins are reserved by the system
   return (pin >= 0 && pin <= 5) || (pin >= 12 && pin <= MAX_PIN);
}

void SoftwareSerial::begin(long speed) {
   // Use getCycleCount() loop to get as exact timing as possible
   m_bitTime = ESP.getCpuFreqMHz()*1000000/speed;
}

void SoftwareSerial::enableRx(bool on) {
   if (m_rxValid) {
      GPIO_INT_TYPE type;
      if (!on)
         type = GPIO_PIN_INTR_DISABLE;
      else if (m_invert)
         type = GPIO_PIN_INTR_POSEDGE;
      else
         type = GPIO_PIN_INTR_NEGEDGE;
      gpio_pin_intr_state_set(GPIO_ID_PIN(m_rxPin), type);
   }
}

int SoftwareSerial::read() {
   if (!m_rxValid || (m_inPos == m_outPos)) return -1;
   uint8_t ch = m_buffer[m_outPos];
   m_outPos = (m_outPos+1) % m_buffSize;
   return ch;
}

int SoftwareSerial::available() {
   if (!m_rxValid) return 0;
   int avail = m_inPos - m_outPos;
   if (avail < 0) avail += m_buffSize;
   return avail;
}

#define WAIT { \
  unsigned long cc = ESP.getCycleCount(); \
  while (ESP.getCycleCount()-start < wait) \
  { \
    if (ESP.getCycleCount() < cc) { \
      start += m_bitTime; \
      wait += m_bitTime; \
    } \
  } \
  wait += m_bitTime; \
}

size_t SoftwareSerial::write(uint8_t b) {
   if (!m_txValid) return 0;

   if (m_invert) b = ~b;
   // Disable interrupts in order to get a clean transmit
   cli();
   unsigned long wait = m_bitTime;
   //digitalWrite(m_txPin, HIGH);
   unsigned long start = ESP.getCycleCount();
    // Start bit;
   digitalWrite(m_txPin, LOW);
   WAIT;
   for (int i = 0; i < 8; i++) {
     digitalWrite(m_txPin, (b & 1) ? HIGH : LOW);
     WAIT;
     b >>= 1;
   }
   // Stop bit
   digitalWrite(m_txPin, HIGH);
   WAIT;
   sei();
   return 1;
}

void SoftwareSerial::flush() {
   m_inPos = m_outPos = 0;
}

int SoftwareSerial::peek() {
   if (!m_rxValid || (m_inPos == m_outPos)) return -1;
   return m_buffer[m_outPos];
}

void ICACHE_RAM_ATTR SoftwareSerial::rxRead() {
   unsigned long wait = m_bitTime / 2;
   unsigned long start = ESP.getCycleCount();
   WAIT;  // wait to be in the middle of the bit time
   uint8_t rec = 0;
   wait = m_bitTime;
   start = ESP.getCycleCount();
   for (int i = 0; i < 8; i++) {
     WAIT;
     rec >>= 1;
     if (digitalRead(m_rxPin))
       rec |= 0x80;
   }
   if (m_invert) rec = ~rec;
   // Stop bit
   WAIT;
   // Store the received value in the buffer unless we have an overflow
   int next = (m_inPos+1) % m_buffSize;
   if (next != m_inPos) {
      m_buffer[m_inPos] = rec;
      m_inPos = next;
   }
}

void ICACHE_RAM_ATTR SoftwareSerial::handle_interrupt(void *arg) {
   uint32_t gpioStatus = GPIO_REG_READ(GPIO_STATUS_ADDRESS);
   // Clear the interrupt(s) otherwise we get called again
   GPIO_REG_WRITE(GPIO_STATUS_W1TC_ADDRESS, gpioStatus);
   ETS_GPIO_INTR_DISABLE();
   for (uint8_t pin = 0; pin <= MAX_PIN; pin++) {
      if ((gpioStatus & BIT(pin)) && InterruptList[pin]) {
         // Seems like the interrupt is delivered on all flanks in regardless
         // of what edge that has been set. Hence ignore unless we have a start bit
         if (digitalRead(pin) == InterruptList[pin]->m_invert)
            InterruptList[pin]->rxRead();
      }
   }
   ETS_GPIO_INTR_ENABLE();
}
supersjimmie commented 8 years ago

Thanks @juancgalvez for contributing to my issue. Are you having the same issues or different issues? I ask this because I see you speak about TX, while my (this) issue is about WDT resets during RX.

supersjimmie commented 8 years ago

@plerup I am now doing several tests:

First idea is that 2.0.0 Stable is the one without the issue, but it is very difficult to be sure because sometimes they all run fine for some longer time.

@juancgalvez I am also testing your changes. When I first tries, I got only corrupt data. That was fixed by reverting this:

   unsigned long wait = m_bitTime;
   unsigned long start = ESP.getCycleCount();
   uint8_t rec = 0;

So your version of that does not work here (at 115200).

EDIT: Latest GIT (from today) with the parts from juancgalvez version looked better. (with the original from plerup some more chars were corrupted sometimes). But within an hour: WDT Reset cause 4.

plerup commented 8 years ago

The arithmetic in WAIT uses unsigned 32 bit values so counter overflow/wraparound shouldn't be a problem. Several notes on this in Arduino forums and alike, here for instance:

http://www.utopiamechanicus.com/article/handling-arduino-microsecond-overflow/

supersjimmie commented 8 years ago

Test with esp8266/Arduino Stable 2.0.0 and SoftwareSerial 2.2 kept running for 3 hours. Now I have restarted it and changed the sending device to send a telegram every 2 sec instead of every 10 sec. (I use an Arduino Nano now to emulate the TX from smartmeter)

EDIT: And at 2h later, the second attempt with Stable 2.0.0 and SoftwareSerial 2.2 keeps running fine. (I have another 1.5h available today for testing, so I restart it now again)

EDIT 2: About 1:30h later it is still running without a wdt reset. I think we can be confident that is keeps running with the Stable esp8266/ardiuno 2.0.0 version. (will do more testing again with latest GIT later today/tomorrow)

juancgalvez commented 8 years ago

@supersjimmie I was having wdt resets very often. After the changes I made they stopped (Well not 100% I have random resets, sometimes within minutes and sometimes aftes almost a day). Apart of wdt resets I was having issues reading and transmitting data at 9600 bps so I mentioned all my issues here. I guess I should have tested at other speeds before posting. I am going to test at different speeds from 300 to 115200 bps.

@plerup Peter, I am going to read the article you referred and comment on this.

plerup commented 8 years ago

I actually had some random WDT resets myself (not using SoftwareSerail) when using the tip version of esp8266/Arduino at some point and this made me move back to 2.0.0.

supersjimmie commented 8 years ago

I have a lot more code running together with latest git and softwareserial 2.2 without wdt resets, only I had to rewrite the serial receiving part as described earlier. That code receives the serial telegram every 10 sec but also reads data from 3 ds18b20 (onewire) temperature sensors, 2 DHT22 temp/humidity sensors, gets weather info from wunderground and sends all collected data to sparkfun and 3 thingpeak channels every minute. This all runs 30 hours without any restart now.

When I use stable 2.0.0, softwareserial misses a lot of (first) bits, which I do not experience with git version.

juancgalvez commented 8 years ago

@plerup. You are right. The WAIT macro modification is not needed. I thought it was the cause of my device resets but it seems it is not. I am using version 2.0.0 and having reset issues. I thing modifications to set TX pin HIGH at the beginning is necessary. Some delay to read TX pin value is needed. I tested from 300 to 38400 and worked when dividing m_bitSize by 2 and at 115200 when dividing by 4.

@supersjimmie I did test at 115200 and you need to divide by 4 instead of two.

supersjimmie commented 8 years ago

I don't understand that part but divided by 4 did the trick.

   unsigned long wait = m_bitTime / 4;

I now receive data with your mods.

juancgalvez commented 8 years ago

@supersjimmie. Yes. that line.

supersjimmie commented 8 years ago

Next test, arduino/esp8266 latest GIT ith softwareserial 2.0.0 and juancgalvez mods is still crashing.

supersjimmie commented 8 years ago

@plerup and @juancgalvez it is still hard to figure out when and where my tests crash.

Sometimes it keeps running for hours so then I wait for nothing. Still I am sure the problem exists in the softwareserial code, because somebody else confirmed it too. He is also using it for reading his smartmeter and contacted me because of this issue. (I sent hem this issue, he is ps-nl who commented a few days ago)

I also know somebody else also using this code but with arduino/esp8266 2.0.0 and softwareserial 2.2, and he does not have this issue.

What I have noticed twice when I say the actual crash, is that it looked like the system frooze for several seconds and then came with the wdt reset. So it looks like a freeze/loop problem. The only idea I can think off, is that the WAIT part is the only loop that could become infinite. Is there some (dirty?) trick to put in there to prevent it to cause an infinite loop? Like some extra counter and a break? Better lose some data, than a crash.

juancgalvez commented 8 years ago

I am having random resets too. My ESP8266 has been up for almost 24 hours but just before the last reset I had almost 10 resets in one hour. The worst thing it that the ESP8266 doesn't run my program after resetting but just keep frozen. Very annoying.

plerup commented 8 years ago

Considering the simplicity of the WAIT macro the only thing I could think of locking the loop would be ESP.getCycleCount() not being incremented. You could try using SoftwareSerial version 1.2. It is using micros() instead of getCycleCount()

supersjimmie commented 8 years ago

@plerup As far as I can remember, earlier versions of softwareserial didn't crash. So that too confirms the possibility of the loop. But the main reason for not using earlier versions is that older versions have too many lost (first) bits/bytes that make it impossible to read the telegrams. getCycleCount problem could be related to what juancgalves said about rollover?

@juancgalvez the ESP not restarting after a reset is a kown problem with esp8266/arduino after 2.0.0. I have that problem too. I have read somewhere that this is only after flashing, you should be able to "solve" it by resetting with the reset button after flashing. After that the module will reset in case of a crash. See https://github.com/esp8266/Arduino/issues/1017

supersjimmie commented 8 years ago

Forget this, I'm on a different track now.

~~About ESP.CycleCount() rollover... I did a simple test by checking for a rollover in my main loop by showing cyclecount, thus seeing when it has rolled over. I received a WDT reset about 50-60 sec after a previous rollover. That sounds very close to the 53 sec rollover mentioned by juancgalvez. This with the original softwareserial 2.2. In the meantime I have been able to do this twice, second time also around a rollover moment.~~

Cycle Count: 535884238
Cycle Count: 1260044284
Cycle Count: 1984204222
Cycle Count: 2708364345
Cycle Count: 3432524297
Cycle Count: 4156684378
Cycle Count: 585876884
Cycle Count: 1310117152
Cycle Count: 2034197090
Cycle Count: 2758436872
Cycle Count: 3482596758
Cycle Count: 4206757066

 ets Jan  8 2013,rst cause:4, boot mode:(1,6)

wdt reset

(As you can see, it does not always fail at a rollover, probably only when the rollover occurs exactly within the WAIT?)

This is also not always true. After more testing, some other wdt resets came more than 10 sec later than a rollover.

plerup commented 8 years ago

Please provide the full source code for your test.

Do you have a script which simulates the input as well?

supersjimmie commented 8 years ago

I have now created a receiving script for the ESP and a sending script on an Arduino:

#define MAXSIZE 128

#include <ESP8266WiFi.h>
#include <SoftwareSerial.h>
SoftwareSerial mySerial(D2, -1, true, MAXSIZE);

void setup()
{
  Serial.begin(115200);
  mySerial.begin(115200);
}

void loop() {
  if (mySerial.available()) {
    Serial.write( mySerial.read());
  }
  yield();
  ESP.wdtFeed();
}
#define SERIAL_TX       13  // TX
#define SERIAL_RX       12  // RX
#include <SoftwareSerial.h> 
SoftwareSerial mySerial(SERIAL_RX, SERIAL_TX, 64);

void setup() {
  Serial.begin(115200);
  mySerial.begin(115200);
}

void loop() {
  mySerial.write(85);
}

First I turn on the ESP and let it stay stable (and quiet) for a minute. Then I turn on the Arduino, the ESP received a lot of bytes, freezes for a few seconds and crashes with a wdt reset. The whole session is done within 10-30 sec after the arduino starts sending.

plerup commented 8 years ago

Hrm, this is somewhat different from your original real setup I guess. In this case there i a constant inflow of interrupts all the time. What happens if you put a small delay in the Arduino loop?

supersjimmie commented 8 years ago

Yes the original setup is sending about 600-700 bytes every 10 sec, with halfway a short delay and another delay just before the end. The crashes are happening a lot less then, which is logical (there is happening nothing during most of the time) Therefor I created this test, just to create the crash within a short time (easier to debug then).

With a delayMicroseconds(50); in the sending loop, the ESP seems to hold, at least for a few minutes now. (But I cannot tell the smartmeter to wait between each char) From this we could learn that a constant stream causes the crash.

So, I created a bit more realistic test from the sending perspective:

void loop() {
  for (int i=0; i < 500; i++) {  // send 500 char
    mySerial.write(85);
  }
  delay(500);  // half sec pause
}

With i at 500 chars it crashes soon, 400 seems to hold a bit longer, 300 looks good. Increasing the softwareserial buffer to 512 (>500) does not solve it.