esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
16.07k stars 13.33k forks source link

Rebooting when joining WiFi #431

Closed sticilface closed 9 years ago

sticilface commented 9 years ago

Using latest version 1.6.1-esp8266-1-1054-g3183c7c

I have a project that has been working fine (at least booting and joining wifi.) now it crashes upon trying to join wifi...

NORMAL ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network....
 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r

Not sure what to do next, to help? I should add that the same code compiled on a previous IDE works fine.

sticilface commented 9 years ago

Figured it out....

This used to work just fine...

int i = 0;
    while ((WiFi.status() != WL_CONNECTED ) && i < 40 ) {
    delay(500);
    i++;
    Serial.print(".");
    if (i == 39) Serial.print("Failed");
    }

but now it needs to be with the brackets around each part of the &&

int i = 0;
    while ((WiFi.status() != WL_CONNECTED ) && (i < 40 )) {
    delay(500);
    i++;
    Serial.print(".");
    if (i == 39) Serial.print("Failed");
    }
sticilface commented 9 years ago

OK that has not fixed it. Seems like some small code changes can cause this... then un cause it.

holgerlembke commented 9 years ago

Not really helping, but dispensing wisdom or so:

my two top rules: -- always use {}, even if not needed. -- always use (), even if not needed.

I don't do if (a==1) b=2; Always: if (a==1) { b=2; }

Just too difficult to remember: if (a & 2 <7) ... if (a * 2<7) ... What was the evaluation rule?

And as far as I see, I use this in wifi-client-mode in my code without any problem.

av1024 commented 9 years ago

Have similar issue with specific router (Mikrotik 951). But not with other (Allied Telesis AT-WA1104G). Got irregular resets by wdt. Sometimes the hardware "reset" solves the issue but not always

BUT! When I switch my program to "more verbose" mode (not recompile, just invoke CLI command for change "logging" flag) and I have no resets. There are a lot of "delay(...)" inside main loop so possibly no room for locking

PS: I use 1.6.5 git build from 2015-05-28

sticilface commented 9 years ago

I suspect that there is something going on..... i just had several reboots... and then it worked all of a sudden. This is on an ESP12. the flash previous to this failed, then i changed it from 512K to 4M and it worked. so it is quite an intermittent problem...

�*h!�`�h
�����怘�~3f<��<���f<����`<?f<~?�3�rl��r��c�n�����p�|����x��ǒ��p�nn��;�n�����b�cl`$`�p�n�������l�����b�n��n�$����b��<~�n�����l`���#�n�rnr���;���lxrr�ےn�����`�����(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network....
 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network....
 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network.rl��r��c�n�����p�|����x��ǒ��p�nn��;�n�����b�cl`$`�p�n�������l�����b�n��n�$����b��>~�n��Ì�l`���#�n�rnr���;���lxrr�ےn����`�����(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network....
 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks
rl��r��c�n�����p�|����x��ǒ��p�nn��;�n�����b�$rrp�n�������l�����b�n��n�$����b��<~�n�����l`���#�n�rnr���;���lxrr�ےn����`�����(�SQS�(RQ�)HT�)SHHHC���r
Welcome to Andrew's ESP Software

EMERGENCY ACCESS MODE ENABLED
Scanning for Networks

Joining Wifi Network.........
Connected to fyffest
IP address: 192.168.1.162
Initiating MQTT Connection: Connecting: Success
MQTT msg SENT: esp/speed, Message: 192.168.1.162
MQTT msg SENT: speed/IP, Message: 192.168.1.162
MQTT msg SENT: speed/Version, Message: WS2812
MQTT msg SENT: speed/Status, Message: Device Ready
Current wifi mode is : 1
MQTT Message Recieved: speed/IP => 192.168.1.162
MQTT Message Recieved: speed/Version => WS2812
MQTT Message Recieved: speed/Status => Device Ready
Commander13 commented 9 years ago

I am also having an issue similar to you. My code loops through open wifi-networks attempting to connect to each one. Every few loops it crashes on a WiFi.begin(ssid) call. I have the most recent commit, and I've tried many different delays, using/not using WiFi.disconnect(), using a fake password [WiFi.begin(ssid, pass)], stripping down the code and rewriting line by line, trying different Serial bauds, trying different memory sizes, trying the esp-01 and esp-12... It runs fine until I start trying to connect to a variety of APs.

This is what I add that causes the issue: Names[i] holds all the open networks found and is set-up properly.

 for (int i=0; i<netcount; i++) {
      ESP.wdtFeed();
      Serial.println(WiFi.localIP());
      WiFi.begin(names[i]); 
      delay(5000);
      Serial.println(names[i]);
      Serial.println(WiFi.localIP());
      WiFi.disconnect();
      delay(3000);
    }

I also tried this code without feeding the watchdog and with removing the delays:

 for (int i=0; i<netcount; i++) {
      Serial.println(WiFi.localIP());
      WiFi.begin(names[i]); 
      Serial.println(names[i]);
      Serial.println(WiFi.localIP());
      WiFi.disconnect();
    }

Still resets. Output:

CHELSEA
0.0.0.0
0.0.0.0
HP852D2E
0.0.0.0
0.0.0.0
Pubert's-guest
0.0.0.0
0.0.0.0

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
erp“à;ÏÈ„ÄÛñ9’œØÒSetup done

I'm not sure if its my code or if it is an SDK issue.

holgerlembke commented 9 years ago

Just fishing in the dark, but could you try a

void *p = malloc(5000); if (p) { free(p); } else { Serial.println("Heap full"); }

somewhere in your loop? The getfreeheap() function might report misleading values...

Commander13 commented 9 years ago

Thanks for the suggestion. I modified like so:

     for (int i=0; i<netcount; i++) {
      ESP.wdtFeed();
      Serial.println(WiFi.localIP());
      void *p = malloc(5000);
      if (p) {
        free(p);
      } else {
        Serial.println("Heap full");
      }
      WiFi.begin(names[i], names[i]); 
       void *g = malloc(5000);
      if (g) {
        free(g);
      } else {
       Serial.println("Heap full");
      }
      delay(5000);
      Serial.println(names[i]);
      Serial.println(ESP.getFreeHeap());
      Serial.println(WiFi.localIP());
      WiFi.disconnect();
       void *pnus = malloc(5000);
      if (pnus) {
        free(pnus);
      } else {
       Serial.println("Heap full");
      }
      delay(3000);
    }

But it still rebooted without giving me a Heap full...

scan start
scan done
59 networks found
Open nets: 15
0.0.0.0
KindFlamingo-guest
28712
192.168.34.12
0.0.0.0

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
erp›à;OÈ„ÄÙq9’œØb¬ÿSetup done
Commander13 commented 9 years ago

This is what the Debugger gives me:

[W]sec 1073686212 error
scandone

0.0.0.0
scandone
reconnect
Fatal exception (28):
epc1=0x4000debe, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
scandone
del if0
usl
sul 0 0

This error is killing me.

av1024 commented 9 years ago

I try code below called every minute in main loop:

// === TEST ===
  uint16_t a0, a1, a2;
  void *_tmp;
  a0 = ESP.getFreeHeap();
  Serial.print("Heap-test : "); Serial.print(a0); Serial.print("/"); 
  _tmp = malloc(5000);
  if (!_tmp) {
    Serial.print("FULL/");
  } else {
    a1 = ESP.getFreeHeap();
    free(_tmp);
    Serial.print(a1); Serial.print("/"); 
  }
  a2 = ESP.getFreeHeap();
  Serial.println(a2);
  // ====

and got Heap-test : 22896/17880/22896 prints (+/- 16 bytes). So it is possibly not heap issue. But I still have no resets here. Emulating "bad wifi" via setting wrong password for SSIDs also has no effect

igrr commented 9 years ago

Fatal exception (28): epc1=0x4000debe, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000 scandone del if0 usl sul 0 0

memcmp function was passed a zero for the first argument and generated an exception.

holgerlembke commented 9 years ago

Problem with getFreeHeap() is, that it gives the total sum of all free heap blocks. So if you have 20 free blocks of 150 bytes in size, you get 3000 bytes free heap. But malloc(200) will fail because no block that size is free.

Basically the function is useless in its current state. And that's why I work on a heap walk utility. Current state: fail. :-/

av1024 commented 9 years ago

As an update about heap: I have "No heap available, failed to malloc 0" message every ~14-15 sec after setDebugOutput(true). There is no 15-sec events in my code (1 or 60sec through delay(10) in loop only)

The same error was in ESP8255WiFiMulti while malloc(<empty-string>) called so I wrote own enumeration class.

UPD: not reproduced on clean sketch... Will try debug PubsubCliend by Imroy... UPD2: Yes. I have got source of my malloc(0) message. Look at mqtt.cpp:readPacket() called for example by PubSubClient::loop().

igrr commented 9 years ago

I think the error in @Commander13 's case is one of names[i] being a null pointer.

Regarding the "No heap available, failed to malloc 0" messages, Espressif's malloc implementation erroneously prints this message when you try to allocate zero bytes. I will add a check to our malloc wrapper so that these messages are not generated.

Regarding heap fragmentation, it is quite hard to achieve such an extreme case in practice without contrived scenarios. I did have the hooks on the allocator functions, pvPortMalloc/vPortFree, installed when I had just started developing this core. I wanted to check if dynamic allocation is useable at all. The hooks just logged the malloc/free calls to console, so I could analyze what was happening. Have to say, with most sketches from the standard Arduino example set which I was able to compile, fragmentation was a non-issue in the long run. While real apps are more complex than these examples, i still think a bit of though and planning goes a long way eliminating issues caused by heap fragmentation.

That is not to deter @holgerlembke from the task of writing a heap walking utility, I just want to point out that it is highly unlikely that this is the root cause of the problem reported here :)

sticilface commented 9 years ago

I'm not entirely sure how to proceed with debugging. For me this only happens on an ESP-12 module, not any of the ESP-01 or a NODE-MCU board.

Second, my script never gets to the loop. in fact it never gets to the line past joining the wifi, where it prints success.. before it reboots..

av1024 commented 9 years ago

@sticilface, Just add Serial.setDebugOutput(true) after Serial.begin(...)

Yet another test with resets. The same debug output few times:

beacon timeout
rm match
pm close 7 0 0/383183288

 ets Jan  8 2013,rst cause:4, boot mode:(3,0)

wdt reset
load 0x40100000, len 29904, room 16 
tail 0
chksum 0x4f
load 0x3ffe8000, len 1580, room 8 
tail 4
chksum 0x53
load 0x3ffe8630, len 4228, room 4 
tail 0
chksum 0x12
csum 0x12
rl
sticilface commented 9 years ago

ah... getting more info than when i tried that before.. here is what i get

Joining Wifi Networkf -240, scandone
Fatal exception (28):
epc1=0x4020a6e6, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r

NORMAL ACCESS MODE ENABLED
Scanning for Networks
scandone
f 0, scandone

Joining Wifi Networkf -240, scandone
Fatal exception (28):
epc1=0x4020a6e6, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e���(�SQS�(RQ�)HT�)SHHHC���r

NORMAL ACCESS MODE ENABLED
Scanning for Networks
scandone
f 0, scandone
av1024 commented 9 years ago

What about minimal sketch? If no error on minimal sketch - try disable parts of your code and check

void setup() {
  Serial.begin(115200);
  Serial.setDebugOutput(true);
  WiFi.begin("hardcoded-ssid", "hardcoded-pass");
}
loop () {
  ;
}

UPD: Hey! Did you use non-printable characters in password? AFAIR, some special symbols was reset my ESP until I change password. It may be '&' or '$'...

TimeTravelingOwls commented 9 years ago

You might also try setting wifi.mode() before calling wifi.begin.

The SDK seems to hold on to your previous wifi.mode setting, and will try to connect to an AP, for instance, even if you don't call wifi.begin, based on your previous setting. Makes it difficult to debug, because it appears random.

Commander13 commented 9 years ago

As per the suggestion of @igrr I checked to ensure that names[i] was not null:

 for (int i=0; i<netcount; i++) {
      ESP.wdtFeed();
      Serial.println(WiFi.localIP());
      if (names[i] != NULL) {
          WiFi.begin(names[i]); 
          delay(5000);
          Serial.println(names[i]);
          Serial.println(WiFi.localIP());
          WiFi.disconnect();
      }
      delay(3000);
    }

Unfortunately it still reboots...

sticilface commented 9 years ago

I call wifi mode before, no change.

my way round it, is to hold down reset for a few seconds.. then just blast reset about 5 times in a row.. then it usually works.

im doubtful that it is a code issue on my end, as it was working just fine on this ESP for weeks. stopped working the moment i started using the IDE that makes one binary... and my code works fine on other ESPs so far. This is my test ESP though so it gets used the most!

holgerlembke commented 9 years ago

You could try to remove power, too.

av1024 commented 9 years ago

Try call WiFi.disconnect() first. Simple mode change does not work for me - ESP still connect via saved config. As "last chance" try full erase - write blank512k.bin from SDK via esptool.

Is this ESP (test) on breadboard? May be electrical/pull-up issue?

PS: I have build latest git version an hour ago and have no issues neither with wifi nor with malloc(0)

sticilface commented 9 years ago

adding WiFi.disconnect() makes no difference

av1024 commented 9 years ago

I have an constant reboots with git versions now. There is no reboots with older (from May 28, two binary file uploaded) build.

scandone
Fatal exception (28):
epc1=0x402090ce, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
 ets Jan  8 2013,rst cause:2, boot mode:(1,6)
 ets Jan  8 2013,rst cause:4, boot mode:(1,6)
wdt reset
scandone
Fatal exception (28):
epc1=0x40209212, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Exception (28):
epc1=0x40209212 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffd30 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffed0:  feefeffe 3ffec078 3ffe9874 40209185  
3ffffee0:  3fff36dc 3fff4558 3fff3344 4020807e  
3ffffef0:  40233493 00000000 00000004 402334a6  
3fffff00:  00000001 666b6469 3fff0061 3fff314c  
3fffff10:  00000001 4021ad0e 3fff4558 3fff4558  
3fffff20:  00000000 00000005 c9000000 40233432  
3fffff30:  3fff4558 40233428 00000000 000000ff  
3fffff40:  402318a7 3fff4558 4021e23d 3ffec330  
3fffff50:  00000000 40231663 3fff36f8 00000000  
3fffff60:  3fffdcb0 3fff36f8 00000000 3fffdcb0  
3fffff70:  40231af8 4021932b 3ffec078 40201018  
3fffff80:  4021932b 40219348 3fffdab0 3ffea170  
3fffff90:  4021937e 3fffdab0 3ffeb1bc 40201dd1  
3fffffa0:  40000f49 40000f49 3fffdab0 40000f49  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
e
igrr commented 9 years ago

From the output, it looks like a null pointer exception inside the system libraries. But I need your .elf file to see where exactly. Could you please upload the elf output file which corresponds to the latest output?

av1024 commented 9 years ago

FYI, I've rebuild (via old 2-file version because it is "working") main loop with a lot of debug prints and it looks like wifi system reconnecting code does not update WDT. I can see a lot of "scandone/reconnecitng to ..." debug messages without returning back to loop(). For now I use Ticker as software watchdog: invoke reset() after [30] seconds and ESP rebooted by Ticker..

Log for attached elf:

#12203 MAIN/setup: .bmp.begin
#12214 MAIN/setup: .ds.begin
#12277 MAIN/setup: .ds.rescan
Found 1 1-wire sensors
28-56b6c7020000a1
#12346 MAIN/setup: dht.begin
f 0,  = WIFI Settings =
Mode: STA
PHY mode: N
Channel: 1
AP id: 0
Status: 1
Auto connect: 1
SSID (5): idkfa
Passphrase (10): *********
BSSID set: 0
 MAC: 18:fe:34:9e:82:d0
 IP: 0.0.0.0
 SSID: idkfa (31dBm), channel: 1
 Status: DISCONN
 * Telnet started.
#12747 MAIN/setup:  mqtt...
#12771 MAIN/setup: .mqtt.begin
#12774 MAIN/setup: cli.add
 * Register callback for 'print'
 * Register callback for 'reset'
 * Register callback for 'restart'
 * Register callback for 'monitor'
 * Register callback for 'eeprom'
 * Register callback for 'debug'
 * Register callback for 'esp'
 * Register callback for 'wifi'
 * Register callback for 'mqtt'
 * Register callback for 'offs'
 * Register callback for 'telnet'
 * Register callback for 'ntp'
 * Register callback for 'led'
 * Register callback for 'out'
#13788 MAIN/loop: 
#13790 MAIN/loop: wifi_connect(10s)
scandone
Fatal exception (28):
epc1=0x4020956e, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Exception (28):
epc1=0x4020956e epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

ctx: sys 
sp: 3ffffd30 end: 3fffffb0 offset: 01a0

>>>stack>>>
3ffffed0:  3fff1fe0 00008640 3ffe998c 402094e1  
3ffffee0:  3fff37fc 3fff4678 3fff3464 402083da  
3ffffef0:  402337ef 00000000 00000004 40233802  
3fffff00:  00000001 666b6469 3fff0061 3fff326c  
3fffff10:  00000001 4021b06a 3fff4678 3fff4678  
3fffff20:  00000000 00000005 c9000000 4023378e  
3fffff30:  3fff4678 40233784 00000000 000000ff  
3fffff40:  40231c03 3fff4678 3fff2008 3ffee168  
3fffff50:  00000000 402319bf 3fff3818 00000000  
3fffff60:  3fffdcb0 3fff3818 00000000 3fffdcb0  
3fffff70:  40231e54 3ffec6f0 000000f5 3ffec6f0  
3fffff80:  40219687 3fff2008 3fffdab0 3ffea290  
3fffff90:  402196da 3fffdab0 00000000 3fffdcc0  
3fffffa0:  40000f49 40000f49 3fffdab0 40000f49  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(1,6)

 ets Jan  8 2013,rst cause:4, boot mode:(1,6)

wdt reset

Hmmm... can't attach file... https://www.dropbox.com/s/rwi1nf7cznngey7/sens_v2.cpp.elf?dl=0

igrr commented 9 years ago

These stack traces are a real life saver!

av1024 commented 9 years ago

Ok. Both of exception and returning to main loop() seems to be solved.

av1024 commented 9 years ago

Yet another exception. As from Imroy's issue 8:

Possibly is enough to turn off wifi router only

#1051615 MAIN/loop:  loop60 connected
.oobeacon timeout
rm 0
pm close 7 0 0/1048432360
f 0, 
ctx: cont 
sp: 3ffeaf20 end: 3ffeb2b0 offset: 01b0

>>>stack>>>
3ffeb0d0:  00000002 3ffe9d48 3ffe9d20 4020a663  
3ffeb0e0:  02e7a8c0 40101d1c 00000000 00000003  
3ffeb0f0:  3ffe3710 0037025b 3ffe9d48 3ffe9d48  
3ffeb100:  00000000 3ffe9d20 3ffeb150 4020a734  
3ffeb110:  3fff4ce8 00000003 00000000 3ffeb1f0  
3ffeb120:  3ffeb150 3ffe9d48 3ffe9d20 4020a914  
3ffeb130:  4020fae9 3ffe9d20 3ffeb174 4020fc90  
3ffeb140:  00000001 3ffeb1fc 3ffe9d20 4020a9b7  
3ffeb150:  3ffe93b0 00000001 00000101 3fff5070  
3ffeb160:  0000000e 0000000e 3fff6230 00000016  
3ffeb170:  00000016 3fff4dd0 00000003 00000003  
3ffeb180:  3fff5090 00000000 00000000 3fff50a8  
3ffeb190:  00000000 00000000 0000000f 3fff5050  
3ffeb1a0:  00000003 00000003 3fff50c0 00000016  
3ffeb1b0:  00000016 40101b8f 3ffeb1f0 4020fba8  
3ffeb1c0:  00000000 3ffeb208 3ffeb1f0 4020fbe0  
3ffeb1d0:  3ffe8d5f 00000064 3ffe9d1c 3ffe9db8  
3ffeb1e0:  3ffe9d20 00000000 3ffe9d1c 40203f77  
3ffeb1f0:  3fff4b48 00000003 00000003 3fff4d58  
3ffeb200:  00000016 00000016 3fff4cc8 0000000e  
3ffeb210:  0000000e 3fff5028 00000016 00000016  
3ffeb220:  3ffe9da0 00000001 60000000 40201eb4  
3ffeb230:  00ff0000 ff000000 3ffeb308 00000000  
3ffeb240:  3ffe9d1c 3ffe9974 3ffeb308 40207ff1  
3ffeb250:  3ffe9260 00000000 000003e8 00000003  
3ffeb260:  00000000 00000000 0000000d 0000000d  
3ffeb270:  0000000b 00000000 00000000 40102361  
3ffeb280:  40201e3d 00000000 00000000 3ffeb2dc  
3ffeb290:  3fffdc20 00000000 3ffeb2d4 40201e9a  
3ffeb2a0:  00000000 00000000 3ffea290 40100450  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a

Updated elf here: https://www.dropbox.com/s/rwi1nf7cznngey7/sens_v2.cpp.elf?dl=0

UPD: Got about 3000 exception lines when wifi router turned on but not yet ready (like 5sec up to wdt reset)

Fatal exception (0): 
epc1=0x4010f280, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
...
igrr commented 9 years ago

The first one you posted is a WDT reset when code spins inside PubSubClient::send_reliably -> PubSubClient::wait_for(uchar,ushort) function. I think i have added a workaround for such a use case but apparently it got broken. I'll check that, but meanwhile you can add yield(); into the while loop inside PubSubClient::wait_for.

As for the second issue, it's a mysterious one, because 0x4010f280 is an address inside the bootloader. I'll try to reproduce that.

av1024 commented 9 years ago

As for me it is not good idea to use delayMicroseconds in loop... I think delay(1) instead of delayMicroseconds(100) enough here. (may be I'm wrong and >=1ms is too long here)

... Replacing delayMicrosecond by delay(1) solves first issue. (I'll create and issue on @Imroy 's project)

the second really was mysterious )) because I can't reproduce it too

igrr commented 9 years ago

delay(1) is too much a delay here, yield(); should take less time.

if you ever find out how to reproduce the second one reliably, please let me know — i'm sure i have seen such behaviour once or twice but wasn't able to isolate the issue.

sticilface commented 9 years ago

My module does this repeatedly. crashes when joining the wifi, and before the setup is completed. What usually fixes it (and this is what i've been doing all weekend) is to reboot the module, 5-6 times, by grounding rst, very quickly. this then i'd say 90% of the time. fixes it. Its weird, but i do to know what could cause this behaviour.

igrr commented 9 years ago

@sticilface the change in 1c8b52b fixed this for @av1024, perhaps it will also help in your case?

sticilface commented 9 years ago

ah sorry, I shall give it a go!

av1024 commented 9 years ago

delay(1) is too much a delay here, yield(); should take less time.

yield() also work ok. (I'm not sure why 100us delay needed while waiting for network i/o)

... possibly isolate second issue. this code prints error message and make crash after reboot:

wifi_bad_cnt ++;
    if (wifi_bad_cnt > WIFI_REBOOT_COUNT) {
      Serial.println("Too many connect fails. Try reboot.");
      Serial.flush();
      WiFi.disconnect();
      delay(1000);
      ESP.reset();
    }
Too many connect fails. Try reboot.
scandone

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 1464, room 16 
tail 8
chksum 0x7a
csum 0x7a
Fatal exception (0): 
epc1=0x4010f280, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Moreover for now the CLI command reset now get the same error. The "reset" handler is:

if (s == "reset") {
    prn.println(F("\n\r\n\r *** RESET ***\n\r"));
    delay(1000);
    ESP.reset();
  }
igrr commented 9 years ago

Ah right. Looks like ESP.reset() doesn't work at all. Please use

ESP.restart();
delay(5000);

as a workaround until i push a fix for that issue.

av1024 commented 9 years ago

Ok. I already have 'restart' handler and it work (even w/o delay)

danbicks commented 9 years ago

Guys, this is excellent work. I have this issue with PubSubClient and thought I was going mad. Adding the Serial.setDebugOutput(true); function gave me a nice clear indication it is not a fatal exception error for me. The problem for me only occurs now if MQTT has established a connection to the broker and subscribed to channels. If I take down my wireless router that the ESP8266 is connected to at this point I get the WDT reset condition. I am thinking it could be at that moment the class try's to do a keep alive type ping and bombs out. Please let me know anyone when a fix has been found.

Thanks Danbicks wdt 1

BJvDL commented 9 years ago

I had the same problem with pubsubclient and it seems to be solved with this version 1.6.4-835-g77d77e8 ( https://github.com/esp8266/Arduino/issues/324 ) See https://github.com/Imroy/pubsubclient/issues/8

danbicks commented 9 years ago

Thanks Buddy I will try a fixed IP out now and then test the version you specify after with DHCP.

Awesome, fingers crossed.

Dans

danbicks commented 9 years ago

You are a hero, so Far I have a static IP assigned and no more WDT resets. Superb so it does seem DHCP related.

Not only that I have a really good test router that links to my main wireless network and does several drop connections before it becomes solid. An Ideal unit to test for harsh network conditions,

I will now look at my current build of the Arduino IDE which is 1.64, how do I find out donwload version of this eg g77d77e8 ?

Massive thanks again

Dans

BJvDL commented 9 years ago

See issue #324 for details how to get Package for nightly build

syadykin commented 9 years ago

Very simple sketch leads to out of heap:

#include <ESP8266WiFi.h>

void setup() {
  delay(2000); // debugging purposes
  Serial.begin(115200);
  Serial.setDebugOutput(true);
  WiFi.printDiag(Serial);
  WiFi.mode(WIFI_STA);
  // with or without this string result is the same
  WiFi.config({192, 168, 78, 233}, {192, 168, 78, 1}, {255, 255, 255, 0}, {192, 168, 78, 1});
  // correct AP name with wrong pass
  WiFi.begin("atmosphere", "wrong password!");
}

void loop() {
  Serial.println(ESP.getFreeHeap());
  delay(500);
}

The log http://pastebin.com/M80fvhxs

igrr commented 9 years ago

The original issue should be fixed in esp8266-1.6.5-804-g2d340c7. Will also check the out-of-memory issue with wrong password.

thethereza commented 9 years ago

i was having a number issues with random wdt reboots while trying to connect to the AP - all at different times after startup. What's interesting is that it seems random - will connect fine for hours then stop working for hours (wdt reboots) then it starts working again. Problem seems greatly improved with the latest code (no reboots yet).

BTW - what's the best way to get the arduino IDE to detect and reload any updates to the board manager .json code?

Also, what tools do you use to debug crashes - gdb?

Thnx,Reza

EdwinGH commented 2 years ago

I also had my ESP8266 (NodeMCU) rebooting on WiFi.begin() with the following messages: ets Jan 8 2013,rst cause:4, boot mode:(3,6) wdt reset load 0x4010f000, len 3460, room 16 tail 4 chksum 0xcc

For me the issue was that I used WiFi.h (used in many old examples) instead of ESP8266WiFi.h.