esp8266 / Arduino

ESP8266 core for Arduino
GNU Lesser General Public License v2.1
15.9k stars 13.35k forks source link

Exception (28) in <lmacProcessAckTimeout>: #1329

Closed PyBerger closed 8 years ago

PyBerger commented 8 years ago

Hi,

I'm working on an application which is pretty demanding, timing wise.

Basically it detects some random activity, and when it happens decode the received data using GPIO interrupts occuring at a 6.4us period (approximatively) and taking about 4us to be served.

In the arduino loop those data are sent through WiFi UDP. WiFi is configured as AP and clients connect to it (android apps).

When no client is connected this works well. When a(or more) clients are connected, I'm getting random crashes in lmacProcessAckTimeout but pretty easy to reproduce (takes few seconds):

Exception (28):
epc1=0x40102ebf epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000024 depc=0x00000000

ctx: cont
sp: 3fff37b0 end: 3fff3ae0 offset: 01a0

>>>stack>>>
3fff3950:  402013a8 00000030 00000018 ffffffff
3fff3960:  4010300a 00000023 3fff5028 40201e38
3fff3970:  40104b90 000c0200 3ffef28c 40201ef4
3fff3980:  00040000 40106594 40032203 00000022
3fff3990:  4000050c 3fffc278 40104880 3fffc200
3fff39a0:  00000022 40201e5c 3fffc258 4000050c
3fff39b0:  40101557 00000030 0000000e ffffffff
3fff39c0:  40201309 00000000 3ffe9a28 00000001
3fff39d0:  00000000 00000000 00000000 ffdfffff
3fff39e0:  ffffffff 3fffc6fc 00000001 3ffeeb04
3fff39f0:  3ffeeae5 3fffdc20 3fff2aac 00000030
3fff3a00:  3ffeeae5 3fffdc20 3fff2aac 00000030
3fff3a10:  ffffffff 3fffc6fc 00000001 3fff2ac0
3fff3a20:  00000000 3fffdc20 3fff2aac 00000030
3fff3a30:  3fff4bdc 3fff4818 0000115c 3fff5d58
3fff3a40:  3fff3b2c 00000027 3ffe84d3 3fff2aac
3fff3a50:  40205c59 3fff5cf0 3fff4c00 4020932d
3fff3a60:  3fff5cf0 3fff4bdc 3fff3b2c 3fff2aac
3fff3a70:  000001f0 00000000 3fff4bb8 40202ce4
3fff3a80:  000266c8 3ffeeac4 3ffeea98 4020234c
3fff3a90:  3ffe8818 0204a8c0 3ffe8818 0204a8c0
3fff3aa0:  00000000 00000000 00000016 4010158d
3fff3ab0:  40203291 3ffeeae5 3ffeeb04 40202408
3fff3ac0:  3fffdc20 00000000 3fff2aa4 402032b9
3fff3ad0:  00000000 00000000 3fff2ac0 40100114
<<<stack<<<

I can't figure out what the probem is.

Below are extract of the code:

void loop()
{
  // check if there is something to push on the ethernet
  // ---------------------------------------------------
  if (read_trpder_pos != write_trpder_pos)
  {
    manageTransponderHits();
    read_trpder_pos++;
    if (read_trpder_pos == RECEIVE_BUFFER_SIZE)
    {
      read_trpder_pos = 0;
    }
[..]

manageTransponderHits does :

unsigned char udpBuffer[16];
IPAddress myIP(192,168,4,2);

// Fill in the buffer 
[...]

udpTransponder.beginPacket(myIP, UDP_TRANSPONDER_PORT);
udpTransponder.write(udpBuffer, 16);
udpTransponder.endPacket();
me-no-dev commented 8 years ago

so you are sending UDP packets every 7 uS as micro seconds? I do not think that the ESP can handle sending packets that often. You can try and have fingers crossed to send a packet every milli second but not more often than that.

PyBerger commented 8 years ago

I will try sending less often longer packets and see how it works.

I'll keep u posted

PyBerger commented 8 years ago

I made some more testing, I'm now sending a 512 bytes udp packet every 500ms. The exception is much more seldom but still happens randomly....

Exception (28):
epc1=0x4010327f epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000024 depc=0x00000000

ctx: cont
sp: 3fff7de0 end: 3fff8110 offset: 01a0

>>>stack>>>
3fff7f80:  402013a8 00000001 4020d4d4 40201cb8
3fff7f90:  401033ca 00080000 00000002 40201cb8
3fff7fa0:  40105079 000c0200 3fff01b8 40201d66
3fff7fb0:  00040000 40201cd8 40032203 4000050c
3fff7fc0:  4000050c 3fffc278 40104d60 3fffc200
3fff7fd0:  00000022 40201cd8 3fffc258 4000050c
3fff7fe0:  40106889 00000030 0000000f ffffffff
3fff7ff0:  4020132c 00000000 03e80000 00000010
3fff8000:  00000005 0579a82b 00100000 ffdfffff
3fff8010:  ffffffff 3fffc6fc 00000001 00000000
3fff8020:  3ffef2a0 3fffdc20 3fff70e8 00000030
3fff8030:  00000000 3fffdc20 3fff70e8 00000030
3fff8040:  ffffffff 3fffc6fc 00000001 3fff70f0
3fff8050:  00000000 3fffdc20 3fff70e8 00000030
3fff8060:  3fff8100 0000000c 3ffe8550 3fff4328
3fff8070:  00016648 00000004 3fff815c 40203b85
3fff8080:  3ffe879c 0001c770 3fff815c 3fff4328
3fff8090:  00016648 0001c770 3fff815c 40203e80
3fff80a0:  00016648 0001c770 3fff815c 402040f5
3fff80b0:  00016648 0001c770 3fff815c 40202220
3fff80c0:  3ffef290 0204a8c0 000003e8 40201937
3fff80d0:  00000000 00000000 00000016 3fff70e8
3fff80e0:  3fffdc20 00000000 3ffef29c 40202356
3fff80f0:  3fffdc20 00000000 3fff70e0 402031ed
3fff8100:  00000000 00000000 3fff70f0 40100114
<<<stack<<<

Any idea how to further track this down ? The crash is still in LMACprocessacktimeout.

pjsg commented 8 years ago

I'm also seeing the same crash with the 1,5 SDK in the nodemcu code (at offset 39 decimal in lmacProcessAckTimeout). My application hardly does any network i/o (maybe two connections per minute). Takes a couple of hours to crash....

There are other people who are also experiencing the same crash in various other versions of the SDK (at the same offset in lmacProcessAckTimeout). I'll bet that this is in the handling of Ack timeouts (i.e. when then AP doesn't acknowledge a packet transmitted by the esp8266 client).

PyBerger commented 8 years ago

I would place the same bet, except on the very end of your sentence :) as in my application I use the ESP8266 as AP...

Is there any means to further decode the stack-trace and figure out how we end-up here ?

I have made several tests and can often reproduce this, in UDP, in TCP with lot of data being sent of with pretty nothing sent either...

Pretty frustrating as I need my application to run (and not reboot) for at least 7 to 8 hours... and so far, it isn't guaranteed at all.

pjsg commented 8 years ago

Ah -- interesting. I did file a bug with espressif about this, but they rejected it as my example was nodemcu based. It looks as though your example is much simpler...

Maybe you can file a ticket with them? See http://espressif.com/bug-bounty/ -- it also appears that they might pay for getting one. Let me know what happens......

PyBerger commented 8 years ago

gave it a try, let's see what comes out.

pjsg commented 8 years ago

Did they accept your bug report?

PyBerger commented 8 years ago

Not at first, replied back asking for more saying it wasn't nodemcu but custom hardware..

Waiting their response since then.

PyB

PyBerger commented 8 years ago

Got a response :

Hi,

Please have a try with our latest ESP8266_NONOS_SDK_V1.5.1 http://bbs.espressif.com/viewtopic.php?f=46&p=5315 .

If your problem is still unsolved, please feel free to let us know.

Regards,
________________________________________
bugbounty@espressif.com

Haven't yet had time to do extensive testing with the 1.5.1 SDK...

Mewiss commented 8 years ago

Hi! I'm running an interrupt and I have the same exception... Did you solve it?

pjsg commented 8 years ago

You just need to upgrade to the latest SDK (1.5.1?) -- that solves it.

Links2004 commented 8 years ago

SDK 1.5.1 is used in 2.1.0-rc2 and git version.

igrr commented 8 years ago

Latest release 2.1.0 uses SDK 1.5.1, so i'm closing this one.

hemangjoshi37a commented 6 years ago

Where to get 2.1.0 SDK?? Can you please provide any link? It will be really helpful. Thax.

devyte commented 6 years ago

PR #3215 . Please google how to test PR locally.