Closed avrhack closed 6 years ago
I feel like I am hitting the same problem. Since yesterday I am facing continuous crashes. Here is an example serial output:
ets Jan 8 2013,rst cause:1, boot mode:(3,3)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
00:00:00 WIF: Connecting to AP1 MyOwnIOT in mode 11N as sonoff-6797...
Exception (0):
epc1=0x40000f77 epc2=0x00000000 epc3=0x00000000 excvaddr=0x20203109 depc=0x00000000
ctx: sys
sp: 3ffffe00 end: 3fffffb0 offset: 01a0
>>>stack>>>
3fffffa0: 3ffea660 40000f49 3fffdab0 40000f49
<<<stack<<<
ets Jan 8 2013,rst cause:1, boot mode:(3,3)
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
ets Jan 8 2013,rst cause:4, boot mode:(3,3)
wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
00:00:00 WIF: Connecting to AP1 MyOwnIOT in mode 11N as sonoff-6797...
Exception (0):
epc1=0x402119e3 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000
ctx: cont
sp: 3fff2600 end: 3fff2830 offset: 01a0
>>>stack>>>
3fff27a0: 00000001 00000001 3fff107c 40107b81
3fff27b0: 3fff3e9c 3fff1030 00000062 40213644
3fff27c0: 3fff0ce4 3fff1030 00000062 4021fbc9
3fff27d0: 3fff107c 3fff0c40 3fff1810 00000000
3fff27e0: 3fffdad0 3fff1800 40202f78 3fff1810
3fff27f0: 3fff0d70 00000062 3fff17f9 3fff1800
3fff2800: 3fffdad0 00000000 3fff107c 40224002
3fff2810: 3fffdad0 00000000 3fff17f9 40202fc4
3fff2820: feefeffe feefeffe 3fff1810 40204028
<<<stack<<<
ets Jan 8 2013,rst cause:4, boot mode:(3,3)
wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
ets Jan 8 2013,rst cause:4, boot mode:(3,3)
If I try to run with WIFI_MANAGER and a non existent STA_SSID1 I get
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
ets Jan 8 2013,rst cause:4, boot mode:(3,7)
wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
ets Jan 8 2013,rst cause:4, boot mode:(3,7)
wdt reset
load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v09826c6d
~ld
�
00:00:00 Project sonoff NewDevice (Topic sonoff, Fallback DVES_453A8D, GroupTopic sonoffs) Version 5.10.0e
ets Jan 8 2013,rst cause:4, boot mode:(3,7)
@avrhack I'm using Visual Studio and PlatformIO. This morning pio was updated to 0.6.0/3.5.0 and my default compile used the staged version where it should have used the released version.
I had to change platformio.ini to tell the compiler to use the released version by changing line
platform = espressif8266
into
platform = espressif8266@1.5.0
The compiled result then runs fine on my hardware...
@Big4SMK if your exceptions differ every time (both number and epc1) you might have a hardware fault.
@Big4SMK those are exactly the same two exceptions I was getting - hardware exception and watchdog restarts. But it's been rock solid using a firmware compiled yesterday on the very same hardware so I think you have hit exactly the same issue as me. I'm going to try and troubleshoot which library is broken but that will take some time.
@arendst I don't think it is a hardware issue for Big4SMK as reasons above. I will put the broken firmware on a less critical device in the next hour or two and see if it fails, then I'll make your change and confirm whether it fixes things. At least it's a fairly hard fault that appears in minutes so easier to troubleshoot than these damned once-in-a-blue-moon ones! Oh and BTW great job on your firmware - absolutely love it thanks.
For me resetting platform = espressif8266@1.5.0 doesn't help, so it could very well be something else that just happened to coincide with the update. I am surprised about it though, as I have been running fine for about two weeks now. Is there any way for me to determine that it is actually a hardware issue? I have been adding home assistant discovery code, but disabled that to rule out my changes as the culprit.
Only two ways I can think of - firstly if you have a working firmware from e.g. yesterday then flash it and see if that solves the issue. Secondly and/or if you don't have that, revert Atom/PlatformIO from a backup. I'm on a Mac so TimeMachine makes that really easy to do with regular snapshots thank goodness.
I've made the change and so far things seem steady. However I'm a little surprised that a STATUS2/STATUSFWR still gives the SDK as "1.5.3(aec24ac9)" so I'm not sure it's actually done anything given that's an obvious git commit reference.
@arendst Is that what you'd expect Theo?
@Big4SMK Did you do a completely clean build ie 'clean' things first (and on macOS/linux delete the .pioenvs directory just to be absolutely sure!)? If not I recommend you retry with that. And sorry if I'm teaching granny to suck eggs, but did you change the build number so you can be absolutely sure your newly built firmware is actually running on the device as I'm sure you know that OTA will only pick up changed firmware versions.....
Or option 3: try a brand new device.
It seems like I got rid of issues with both devices now. I resorted to powering my first device through my usb ttl converter so I could flash it through serial instead of OTA/web. However, the power on my usb ttl converter cannot give the devices enough power leading to (additional) random reboots. When I saw the same problems with the new device hooked up to USB this idea occured to me. I hooked the newly flashed device onto 230V and everything seemed stable again with @arendst proposed build changes. When I hooked up my first device to 230V it started working again as well.
Sorry for confusing myself as well as you by changing multiple parameters during troubleshooting....
@Big4SMK Excellent news!
Shame you had the hassle but it's always difficult when you're troubleshooting not to change more than one thing. Was actually nice to know it wasn't just me with this problem - always wonder if there's something you've changed that you can't remember :)
Main thing is that you're working again as am I and a big thanks to Theo for the very quick response - saved me a load of pain reverting the platformIO/Atom changes from a backup!
I will close this ticket now.
I stupidly left Atom set to auto-update and it did a load this morning. I then installed the newly compiled firmware on a new module and it was crashing all over the place - no source code changes from a 100% working version. Spent ages fooling around with power etc. as that's the usual cause, eliminated everything else then dug out the firmware compiled yesterday on Atom before the updates.
Sure enough that version from yesterday is rock solid.
So now I can't compile source code anymore as Atom/PlatformIO have screwed something up. Obviously not everyone will hit this as it depends on the modules installed/in use. For me that's OneWire and a version of INA219 that I cloned and use to drive several INA226s.
Anyone else seen this?