cyberman54 / ESP32-Paxcounter

Wifi & BLE driven passenger flow metering with cheap ESP32 boards
https://cyberman54.github.io/ESP32-Paxcounter/
Other
1.73k stars 405 forks source link

Can running continuous WiFi scanning damage a LoPy4? #782

Closed tjvandam closed 3 years ago

tjvandam commented 3 years ago

I am not sure why this is happening, but running the paxcounter on Pycom LoPy4 (in a Pycom Expansion Board 3) power over USB - the LoPy4 gets damaged beyond repair. It looks like an overheating issue to me, but hard to confirm as it was running for 2 weeks without issues on the first run and only 1 day on the second run.

It runs WiFi scan only in continuous mode. No sleep.

spmrider commented 3 years ago

I have several LoPy4 running continously since more than half a year and none of them showed this behaviour by now. Running on cycliq mode, though, WiFi only at the moment - will switch to BLE scanning shortly.

tjvandam commented 3 years ago

I have the feeling it may also have to do with the Expansion Board 3. Could it be a damaging problem if I did not enable some of these lines? https://github.com/cyberman54/ESP32-Paxcounter/blob/master/src/hal/lopy4.h#L38

Another reason could be moisture / condense - shorting some of the pins: in both case the LoPy4 stopped working in the middle of the night. If this is the case, the Pycom boards are very very sensitive, since it is in a large industrial IP67 enclosure with a vent plug - so no large amounts of moisture should be able to get in.

cyberman54 commented 3 years ago

@tjvandam you can query the device's ESP32 CPU core temperature by sending a downlink rcommand 0x81. The device answers with some status information, including temperature and uptime. My experience is, that the CPU core temperatures shown are about 30 - 40°C above ambient temperature. So you can expect values between 40 and 80° C, depending on your casing/cooling.

I ran paxcounter on a lot of different devices, including those from pycom. I never damaged a module by software. Do you have more evidence, that your board died by running paxcounter and not by e.g. a hazard caused by your power supply?

tjvandam commented 3 years ago

I am running it on USB via the Expansion Board 3 using a Voltaic V88 power pack. There should be no strange power issues from this power supply, so the only thing that I can think of is moisture or a wrong use of the code by me. No clue how to debug because it happens so randomly.

cyberman54 commented 3 years ago

Can you precise the LoPy4 gets damaged beyond repair ? What does not work any longer, and why do you suspect the LoPy4 ist hardware damaged? Did you test another LoPy4 on the same expansion board? With how many LoPy4 boards did you run in this issue, a single board ore more?

tjvandam commented 3 years ago

So I finally got some more time to test this.

Running the same code on multiple new Expansion Board 3 and a new Lopy4, it ends in the kind of failure. They stop working at some random point (few hours / days) and the Lopy4 is broken beyond repair and in another case the Expansion Boars 3 is broken beyond repair. You can smell something is burned and in the case of the broken Expansion Board 3, you can clearly see that there has been overheating and burning of components around the power supply of the board. In the first place I suspected to be a short circuit happening due to moisture build-up, but even with treatment with conformal coating, the same thing happened. I now even tested the code on Lopy4 only (without the Expansion Board) directly powered over USB 5V (GND and VIN pins) --> and again after 2 hours the whole thing stopped working and the board felt overheated completely.

How can this happen? Is the lopy4 not capable of running this program with continuous scanning like this? Don't thinks so, but I have no clue how to debug this further whiteout damaging more and more LoPy4's. 20210501_195800

cyberman54 commented 3 years ago

What type of power supply do you use? Did you request the core temperature (rcommand 0x81) and have some values?

tjvandam commented 3 years ago

I am running it on a V88 Portable Laptop Battery: https://voltaicsystems.com/v88/

tjvandam commented 3 years ago

I did not request the device status, since it was broken already, so not able to do so. What is a normal / max temperature you would expect?

tjvandam commented 3 years ago

Starting a fresh one, I get the following status info:

battery: 0 memory: 4194000 reset0: 1 reset1: 14 temp: 59 uptime: 602

cyberman54 commented 3 years ago

On our LoPy4 fleet, running on same expansion board, cased in Pycom case, installed indoors, we see CPU core temperatures at 40-50°C, which is fairly low. We have another installation in heated environment, running same software, but on a custom ESP32 board, not a LoPy4. Here we see core temperatures at 75°C. Both installations run > 100 days, and we never had heat issues or hardware damage.

We're not using power banks, but dumb usb chargers for power supply.

I would suggest that you either change your power supply to a dumb USB 5V charger (which i.e. does not have an USB charging intelligence) or you change the power bank using a model which provides a dumb 5V USB output. You can enforce this using a USB "charge only" cable which cuts off D+/D- lines.

The USB on the Pycom expansion board is driven by an MCU (PIC) with a pycom specific firmware, i don't know what this does, but maybe there is a USB handshake between PIC and your power bank which results to some kind of overvoltage issue?

cyberman54 commented 3 years ago

59°C core temp @ 20-25°C room temp seems ok. But watch this value if it climbs in your application. If yes, you probably have a power supply problem.

tjvandam commented 3 years ago

Super strange this.... I have replaced the USB cable with only + and - now, so there should be no handshake anymore. Let's see what happens. Are you also running them in continuous WiFi scan mode like me? So no sleep?

cyberman54 commented 3 years ago

I peeked into the link of your power bank. You are using a model which provides USB power delivery, what means the device can supply voltages > 5V: "Hi-Voltage Laptop Port (12V, 16V, 19V or 24V)"

Maybe this happenes - for some reason - if you connect it to a LoPy4.

What if you run any other kind of software on the LoPy4, while power setup is the same, did you test this?

--

And: Yes, our LoPy4 fleet is running continously 24/7 in wifi sniffing mode without sleep.

tjvandam commented 3 years ago

I am not using the High Voltage port, I am using the USB port, so this should not be the issue.

Are you willing to share your working code, so I can compare if there is some other issue?

cyberman54 commented 3 years ago

The working code is the code of this repository, no differences.

cyberman54 commented 3 years ago

PS: You may paste your lopy4.h and paxcounter.conf here, i will compare then.

cyberman54 commented 3 years ago

PPS: I would suggest to address this issue at Pycom. I think it doesn't matter whether this issue is caused by running the paxcounter software (but i still don't expect this), the hardware design of LoPy4 should never allow any kind of software to burn the hardware. This would be a bad hardware design.

Edit: I posted the issue on pycom forum, requesting feedback.

tjvandam commented 3 years ago

lopy4.h `// clang-format off // upload_speed 921600 // board lopy4

ifndef _LOPY4_H

define _LOPY4_H

include

// Hardware related definitions for Pycom LoPy4 Board

define HAS_LORA 1 // comment out if device shall not send data via LoRa

//#defin HAS_SPI 1 // comment out if device shall not send data via SPI // pin definitions for local wired SPI slave interface //#define SPI_MOSI GPIO_NUM_22 //#define SPI_MISO GPIO_NUM_33 //#define SPI_SCLK GPIO_NUM_26 //#define SPI_CS GPIO_NUM_36

define CFG_sx1276_radio 1

define HAS_LED NOT_A_PIN // LoPy4 has no on board mono LED, we use on board RGB LED

define RGB_LED_COUNT 1 // we have 1 LEDs

define HAS_RGB_LED SmartLed rgb_led(LED_WS2812, RGB_LED_COUNT, GPIO_NUM_0) // WS2812B RGB LED on GPIO0 (P2)

define BOARD_HAS_PSRAM // use extra 4MB extern RAM

// Note: Pins for LORA chip SPI interface come from board file pins_arduino.h

// select WIFI antenna (internal = onboard / external = u.fl socket)

define HAS_ANTENNA_SWITCH (21) // pin for switching wifi antenna (P12)

define WIFI_ANTENNA 0 // 0 = internal, 1 = external

// uncomment defines in this section ONLY if your LoPy lives on a PYTRACK BOARD // #define HAS_GPS 1 // #define GPS_I2C GPIO_NUM_25, GPIO_NUM_26 // SDA (P22), SCL (P21) // #define GPS_ADDR 0x10

// uncomment defines in this section ONLY if your LoPy lives on a EXPANSION BOARD //#define HAS_LED (12) // use if LoPy is on Expansion Board, this has a user LED //#define LED_ACTIVE_LOW 1 // use if LoPy is on Expansion Board, this has a user LED //#define HAS_BUTTON (13) // user button on expansion board //#define BUTTON_PULLUP 1 // Button need pullup instead of default pulldown //#define BAT_MEASURE_ADC ADC1_GPIO39_CHANNEL // battery probe GPIO pin -> ADC1_CHANNEL_7 //#define BAT_VOLTAGE_DIVIDER 2 // voltage divider 1MOhm/1MOhm -> expansion board 3.0 //#define BAT_VOLTAGE_DIVIDER 4 // voltage divider 115kOhm/56kOhm -> expansion board 2.0

endif`

paxcounter.conf

`// clang-format off

// ----- Paxcounter user config file ------ // // --> adapt to your needs and use case <-- // // Note: After editing, before "build", use "clean" button in PlatformIO!

// Verbose enables additional serial debug output

define VERBOSE 0 // set to 0 to silence the device, for mute use build option

// Payload send cycle and encoding

define SENDCYCLE 150 // payload send cycle [seconds/2], 0 .. 255

define SLEEPCYCLE 0 // sleep time after a send cycle [seconds/2], 0 .. 255; 0 means no sleep [default = 0]

define PAYLOAD_ENCODER 1 // payload encoder: 1=Plain, 2=Packed, 3=Cayenne LPP dynamic, 4=Cayenne LPP packed

define COUNTERMODE 0 // 0=cyclic, 1=cumulative, 2=cyclic confirmed

// MAC sniffing parameters

define MACFILTER 1 // set to 0 if you want to scan all devices, 1 to scan only devices with random MACs (aka smartphones) [default = 1]

define BLECOUNTER 0 // set to 0 if you do not want to install the BLE sniffer

define WIFICOUNTER 1 // set to 0 if you do not want to install the WIFI sniffer

define MAC_QUEUE_SIZE 100 // size of MAC processing buffer (number of MACs) [default = 50]

// BLE scan parameters

define BLESCANTIME 0 // [seconds] scan duration, 0 means infinite [default], see note below

define BLESCANWINDOW 80 // [milliseconds] scan window, see below, 3 .. 10240, default 80ms

define BLESCANINTERVAL 80 // [illiseconds] scan interval, see below, 3 .. 10240, default 80ms = 100% duty cycle

// Corona Exposure Notification Service(ENS) counter

define COUNT_ENS 0 // count found number of devices which advertise Exposure Notification Service

                                            // set to 1 if you want to enable this function [default=0]

// for additional sensors (added by some user)

define HAS_SENSOR_1 0 // set to 1 to enable data transfer of user sensor #1 (also used as ENS counter) [default=0]

define HAS_SENSOR_2 0 // set to 1 to enable data transfer of user sensor #2 [default=0]

define HAS_SENSOR_3 0 // set to 1 to enable data transfer of user sensor #3 [default=0]

/ Note: guide for setting bluetooth parameters

// WiFi scan parameters

define WIFI_CHANNEL_MIN 1 // start channel number where scan begings

define WIFI_CHANNEL_MAX 13 // total channel number to scan

define WIFI_MY_COUNTRY "EU" // select locale for Wifi RF settings

define WIFI_CHANNEL_SWITCH_INTERVAL 50 // [seconds/100] -> 0,5 sec.

// LoRa payload default parameters

define MEM_LOW 2048 // [Bytes] low memory threshold triggering a send cycle

define RETRANSMIT_RCMD 5 // [seconds] wait time before retransmitting rcommand results

define PAYLOAD_BUFFER_SIZE 51 // maximum size of payload block per transmit

define PAYLOAD_OPENSENSEBOX 0 // send payload compatible to sensebox.de (swap geo position and pax data)

define LORADRDEFAULT 5 // 0 .. 15, LoRaWAN datarate, according to regional LoRaWAN specs [default = 5]

define LORATXPOWDEFAULT 14 // 0 .. 255, LoRaWAN TX power in dBm [default = 14]

define MAXLORARETRY 0 // maximum count of TX retries if LoRa busy

define SEND_QUEUE_SIZE 1 // maximum number of messages in payload send queue [1 = no queue]

// Hardware settings

define RGBLUMINOSITY 0 // RGB LED luminosity [default = 30%]

define DISPLAYREFRESH_MS 0 // OLED refresh cycle in ms [default = 40] -> 1000/40 = 25 frames per second

define DISPLAYCONTRAST 0 // 0 .. 255, OLED display contrast [default = 80]

define DISPLAYCYCLE 0 // Auto page flip delay in sec [default = 2] for devices without button

define HOMECYCLE 30 // house keeping cycle in seconds [default = 30 secs]

// Settings for BME680 environmental sensor

define BME_TEMP_OFFSET 5.0f // Offset sensor on chip temp <-> ambient temp [default = 5°C]

define STATE_SAVE_PERIOD UINT32_C(360 60 1000) // update every 360 minutes = 4 times a day

define BMECYCLE 1 // bme sensor read cycle in seconds [default = 1 secs]

// OTA settings

define USE_OTA 0 // set to 0 to disable OTA update

define WIFI_MAX_TRY 5 // maximum number of wifi connect attempts for OTA update [default = 20]

define OTA_MAX_TRY 5 // maximum number of attempts for OTA download and write to flash [default = 3]

define OTA_MIN_BATT 50 // minimum battery level for OTA [percent]

define RESPONSE_TIMEOUT_MS 60000 // firmware binary server connection timeout [milliseconds]

// settings for syncing time of node with a time source (network / gps / rtc / timeserver)

define TIME_SYNC_LORAWAN 0 // set to 1 to use LORA network as time source, 0 means off [default = 1]

define TIME_SYNC_LORASERVER 0 // set to 1 to use LORA timeserver as time source, 0 means off [default = 0]

define TIME_SYNC_INTERVAL 0 // sync time attempt each .. minutes from time source [default = 60], 0 means off

define TIME_SYNC_INTERVAL_RETRY 0 // retry time sync after lost sync each .. minutes [default = 10], 0 means off

define TIME_SYNC_SAMPLES 1 // number of time requests for averaging, max. 255

define TIME_SYNC_CYCLE 60 // delay between two time samples [seconds]

define TIME_SYNC_TIMEOUT 400 // timeout waiting for timeserver answer [seconds]

define TIME_SYNC_COMPILEDATE 0 // set to 1 to use compile date to initialize RTC after power outage [default = 0]

// time zone, see https://github.com/JChristensen/Timezone/blob/master/examples/WorldClock/WorldClock.ino

define DAYLIGHT_TIME {"CEST", Last, Sun, Mar, 2, 120} // Central European Summer Time

define STANDARD_TIME {"CET ", Last, Sun, Oct, 3, 60} // Central European Standard Time

// Ports on which the device sends and listenes on LoRaWAN and SPI

define COUNTERPORT 1 // counts

define MACPORT 0 // network commands

define RCMDPORT 2 // remote commands

define STATUSPORT 2 // remote command results

define CONFIGPORT 3 // config query results

define GPSPORT 4 // gps - NOTE: set to 1 to send combined GPS+COUNTERPORT payload

define BUTTONPORT 5 // button pressed signal

define BEACONPORT 6 // beacon alarms

define BMEPORT 7 // BME680 sensor

define BATTPORT 8 // battery voltage

define TIMEPORT 9 // time query and response

define SENSOR1PORT 10 // user sensor #1

define SENSOR2PORT 11 // user sensor #2

define SENSOR3PORT 12 // user sensor #3

// Cayenne LPP Ports, see https://community.mydevices.com/t/cayenne-lpp-2-0/7510

define CAYENNE_LPP1 1 // dynamic sensor payload (LPP 1.0)

define CAYENNE_LPP2 2 // packed sensor payload (LPP 2.0)

define CAYENNE_GPS 3 // full scale GPS payload

define CAYENNE_ACTUATOR 10 // actuator commands

define CAYENNE_DEVICECONFIG 11 // device period configuration

define CAYENNE_SENSORREAD 13 // sensor period configuration

define CAYENNE_SENSORENABLE 14 // sensor enable configuration

// MQTT settings, only needed if MQTT is used (#define HAS_MQTT in board hal file)

define MQTT_ETHERNET 0 // select PHY: set 0 for Wifi, 1 for ethernet

define MQTT_INTOPIC "paxin"

define MQTT_OUTTOPIC "paxout"

define MQTT_PORT 1883

define MQTT_SERVER "public.cloud.shiftr.io"

define MQTT_USER "public"

define MQTT_PASSWD "public"

define MQTT_RETRYSEC 20 // retry reconnect every 20 seconds

define MQTT_KEEPALIVE 10 // keep alive interval in seconds

//#define MQTT_CLIENTNAME "my_paxcounter" // generated by default `

cyberman54 commented 3 years ago

@tjvandam i don't see any suspicious settings here, all fine. MAC_QUEUE_SIZE 100 ist unnecessary, 50 is far more than enough, even in dense environments - but 100 doesn't hurt. The current development version 2.5.x does not need MAC queuing any longer, since it comes with a far more efficient storage mechanism. It is stable already, will be released as master soon.

jcaron23 commented 3 years ago

@tjvandam When you connected the LoPy4 directly, what did you power it with?

Note that even USB can provide much higher voltage than 5V. In the case of your power bank, USB QC can go up to 12V, USB C PD can go up to 20V (more generally USB QC can go up to 20V as well).

It should only happen if the device supports it (based on negotiation in theory), but there may be something wrong going on here with the power supply providing a lot more voltage than it should.

Try to measure the voltage provided, it will probably explain the failures.

cyberman54 commented 3 years ago

Closed, since this is obviously a power supply issue, and not paxcounter related.

cyberman54 commented 3 years ago

@tjvandam Please watch this thread on Pycom forum to solve your problem. I am sure it is an overvoltage caused by power supply issue. Please don't use your USB QC power bank any longer with your Pyom hardware, while we are awaiting Pycom's statement. Please use a dumb charger instead, to protect your hardware.