JosephHewitt / wardriver_rev3

A portable ESP32-based WiFi/Bluetooth scanner for Wigle.net.
https://wardriver.uk
GNU General Public License v3.0
241 stars 23 forks source link

Intermittent "ESP-B NO DATA" issues, reboot usually fixes #77

Open pejacoby opened 1 year ago

pejacoby commented 1 year ago

With some degree of regularity I am getting "ESP-B NO DATA" messages on the display and no BLE or GSM data from side B. Restarting the Wispy pretty much always clears up the issue.

I have not been able to find a common cause, but it seems more likely to happen when I do not have a network connection for the WiSpy to connect to initially.
It also seems like not getting a quick GPS lock will cause the issue. In these cases I'll see the time displayed as 1970. If I eventually get a GPS lock, the time will update but side B will remain quiet.

I just fired up the unit in my basement and it shows "No GPS: No GSM pos / ESP-B NO DATA". It has a valid time from the network connection. I rebooted, and now have "No GPOS: GSM pos OK / BLE: 31 GSM: 1"

Not sure if there are debugs that could be enabled to see more -- I'm on version 1.1.0, flashed via the Arduino IDE.

JosephHewitt commented 1 year ago

Hi,

Thanks for reporting this.

This issue sounds a lot like #23 where this message appears if you turn on the wardriver in an area with a lot of Bluetooth.

It is also normal for this message to appear on first boot for a short while (usually for under a minute), but this can be delayed by the cell tower triangulation feature which causes some temporary slowdowns when in an area without GPS.

You can connect a cable to the ESPs in your wardriver in order to view information about how they are performing. Note: You should not connect both simultaneously to your PC. You can use the serial monitor in the Arduino IDE, putty (Windows), or screen (Linux/Mac) to view this on baud rate 115200. It would be helpful to view the information from B and attempt to reproduce the issue; you should see a lot of information about nearby BLE/WiFi devices if everything is working correctly. You could also do the same with A to see if there is any additional information.

pejacoby commented 1 year ago

Thanks - I've been trying to catch the issue with serial logs active but of course, it's evasive. I'll keep trying to see what I can capture.

JosephHewitt commented 10 months ago

Hi,

I'm looking to close some issues which have not have activity for a while.

Were you able to gather any more information regarding this?

Thanks.

pejacoby commented 10 months ago

Unfortunately no, and I just had it happen on a drive today. The only thing different about today’s issue is on the reboot before it happened the WiSpy started up with "Unexpected reset” and a code that I can’t remember now. Everything booted fine after that displayed, but I wasn’t paying attention until I was on the road and noticed the “ESP-B NO DATA” message.

Someday I’ll catch it while tethered with serial output active, but it’s a tough one.

On Oct 19, 2023, at 3:50 PM, Joseph Hewitt @.***> wrote:

Hi,

I'm looking to close some issues which have not have activity for a while.

Were you able to gather any more information regarding this?

Thanks.

— Reply to this email directly, view it on GitHub https://github.com/JosephHewitt/wardriver_rev3/issues/77#issuecomment-1771686564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBRFZLNQ3RIY4TVZVXTC63YAGG7ZAVCNFSM6AAAAAAZB572VKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZRGY4DMNJWGQ. You are receiving this because you authored the thread.


Paul Jacoby @.***

JosephHewitt commented 9 months ago

As of #137 (to be released later as v1.2.0b5), logging and an additional error is shown which may help track this down:

ESP-B NO DATA = No data has been received from Side B (specifically, it is looking for BLC, messages which is the Bluetooth counter, this is meant to be sent by Side B every ~2 seconds). This never arriving could be a sign of Side B being stuck at boot, or not starting at all, or a hardware fault preventing the communication. ESP-B RESET = The Side B ESP32 was reset at some point in the past 20 seconds. The reset code is logged to test.txt in this situation. Most likely caused by bad power or a software issue.

After 1.2.0b5 releases, it would be interesting to see if you can reproduce this and then share the extra logging from test.txt. That should show if this is a software exception or something power related, for example.

pejacoby commented 9 months ago

Get additions, looking forward to the beta to test. This issue has not been seen for a while, so having the extra info logged should be valuable when it does appear.

pejacoby commented 8 months ago

Hi again -- I have hit the "ESP-B NO DATA" three times in three days. My test case is driving into the office, parking in our parking ramp, which completely blocks GPS, and firing up the war driver at the end of the day in the ramp. I have no network to connect to, so the boot process goes through trying to connect and fails to do so. I end up on the main scanning screen with ESP-B NO DATA and NO GPS. Side A is actively scanning WiFi.

Once I leave the ramp, I will eventually get a GPS signal (2 to 5 minutes later). On TWO of three occassions, I got all the way home (20 minute drive out in the open) and still had ESP-B NO DATA on display. Today, about 10 minutes into the drive, I noticed I had side B supplying BLE data. Unclear why things managed to sync up today.

Here's output from test.txt if it's helpful:

_BOOT_1.2.0b5, ut=2392285, rr=1, id=3893376 _BOOT_1.2.0b5, ut=2400804, rr=1, id=3893376, bc=315, ep=1702385437, bsh=xxxxx _BOOT_1.2.0b5, ut=2404874, rr=1, id=3893376, bc=316, ep=0, bsh=xxxxx _BOOT_1.2.0b5, ut=2422990, rr=1, id=3893376, bc=317, ep=0, bsh=xxxxx _BOOT_1.2.0b5, ut=2394751, rr=1, id=3893376, bc=318, ep=1702424845, bsh=xxxxx _BOOT_1.2.0b5, ut=2430024, rr=1, id=3893376, bc=319, ep=1702471726, bsh=xxxxx _BOOT_1.2.0b5, ut=2360868, rr=1, id=3893376, bc=320, ep=0, bsh=xxxxx

_B-RST_6,ut=1092050,blc=0

_B-RST_6,ut=1101627,blc=0

_BOOT_1.2.0b5, ut=2351921, rr=1, id=3893376

One other item of note - file 315 and 319 both have one line of corrupt output, looking like a collision of side A and side B content. Might be totally unrelated.


319 (some lines before and one after the glitch): F3:AD:C9:FC:BA:BD,,[BLE],2023-12-13 13:19:31,0,-93,44.9506416,-93.0891037,235.10,2.50,BLE FC:58:FB:00:7D:25,mini lifejacket,[BLE],2023-12-13 13:19:31,0,-96,44.9506416,-93.0891037,235.10,2.50,BLEF2:9F:C2:FD:B4:95,Creators Space,[WPA2_PSK],2023-12-13 13:19:31,6,-71,44.9506416,-93.0891037,235.10,2.50,WIFI

F0:9F:C2:FD:B4:95,Wextech,[WPA2_PSK],2023-12-13 13:19:31,6,-74,44.9506416,-93.0891037,235.10,2.50,WIFI


315 (some lines before and one after the glitch): A4:56:CC:3E:36:5C,,[WPA2_PSK],2023-12-12 13:17:28,11,-86,44.9500961,-93.0922165,242.40,2.50,WIFI 88:F0:31:D7:C8:C0,Ramsey,[WPA2],2023-12-12 13:17:28,1,-84,44.9500961,-93.0922165,242.40,2.50,WIFIBA:5E:71:2B:A0:8C,,[WPA2],2023-12-12 13:17:28,11,-92,44.9500961,-93.0922165,242.40,2.50,WIFI

88:F0:31:D7:C8:C1,RamseyVendor,[OPEN],2023-12-12 13:17:28,1,-85,44.9500961,-93.0922165,242.40,2.50,WIFI

JosephHewitt commented 5 months ago

Hi,

Sorry for the delay regarding this issue. I have also observed this behavior on one of my stable wardrivers (one with hundreds of hours of usage), but only when using a particular powerbank.

Have you observed this issue when powered from other sources? I suspect this one could be power related. For example if your power supply or power cable isn't quite powerful enough to handle the full power spikes of the wardriver.

pejacoby commented 5 months ago

I will need to do some testing with alternate power sources -- my WiSpy build (which like yours has hundreds of run hours) has been running off an Anker PowerCore 10000 for its whole life. I have a couple of other 36.8K packs or straight cigarette-lighter power I can try out in the coming weeks.

revkillj0y commented 4 months ago

I have observed similar behavior when using "Mophie Powerstation 6.2k" (https://fccid.io/ANATEL/04964-16-05669/manual/B97AE7F5-2AFF-4B4D-AD2B-7A7F2DFC1ADA)

In my setup I use a female/male 5.5mm barrel plug: Anker> USB-A-to-5.5mm male>5.5mm female>board I suspect this is an undervoltage issue, at least in my case.

pejacoby commented 4 months ago

So weird, the issue never manifests itself with my Anker 10000 power pack when I fire it up on my kitchen table. But take it to work downtown and fire it up in the parking ramp where GPS signal is negligible and my usual network is inaccessible and boom, there the issue is. I've run it on a 36.8K power pack for about a month now and see the issue pretty consistently when I have no GPS signal and no network connectivity.

revkillj0y commented 4 months ago

Using a "Mophie Powerstation 6.2k" (https://fccid.io/ANATEL/04964-16-05669/manual/B97AE7F5-2AFF-4B4D-AD2B-7A7F2DFC1ADA) I am able to reproduce NO DATA during boot up using both USB-A ports on the power bank

https://github.com/JosephHewitt/wardriver_rev3/assets/35399292/9972f31c-47c2-40f4-979b-0e4271d0ac65

https://github.com/JosephHewitt/wardriver_rev3/assets/35399292/0860a05c-e6fb-4fc0-a785-a38aa47cc3a0

pejacoby commented 4 months ago

That's a bit different from my situation -- in my scenarios my WiSpy never comes out of "ESP-B NO DATA" until I reboot it again. Something affects the sync up such that it doesn't recover, even after 10 or more minutes.

I'll try hooking up my USB tester to monitor for a power surge, I see your didn't change too much.

JosephHewitt commented 3 months ago

I am able to reproduce NO DATA during boot up using both USB-A ports on the power bank

This is normal behavior. This is only a problem when that message appears for a long period of time. Essentially what you are observing is the first ESP becoming ready a couple of seconds before the second while things are "warming up".

The message disappearing confirms the second ESP is working correctly, but if it stays on permanently (or for long periods), it means the second ESP never actually booted up or isn't able to communicate for some reason. This could potentially be caused by power problems.

I will investigate improving this error so that it is less likely to flash up briefly when there is nothing actually wrong.