Closed tjvandam closed 3 years ago
I have several LoPy4 running continously since more than half a year and none of them showed this behaviour by now. Running on cycliq mode, though, WiFi only at the moment - will switch to BLE scanning shortly.
I have the feeling it may also have to do with the Expansion Board 3. Could it be a damaging problem if I did not enable some of these lines? https://github.com/cyberman54/ESP32-Paxcounter/blob/master/src/hal/lopy4.h#L38
Another reason could be moisture / condense - shorting some of the pins: in both case the LoPy4 stopped working in the middle of the night. If this is the case, the Pycom boards are very very sensitive, since it is in a large industrial IP67 enclosure with a vent plug - so no large amounts of moisture should be able to get in.
@tjvandam you can query the device's ESP32 CPU core temperature by sending a downlink rcommand 0x81. The device answers with some status information, including temperature and uptime. My experience is, that the CPU core temperatures shown are about 30 - 40°C above ambient temperature. So you can expect values between 40 and 80° C, depending on your casing/cooling.
I ran paxcounter on a lot of different devices, including those from pycom. I never damaged a module by software. Do you have more evidence, that your board died by running paxcounter and not by e.g. a hazard caused by your power supply?
I am running it on USB via the Expansion Board 3 using a Voltaic V88 power pack. There should be no strange power issues from this power supply, so the only thing that I can think of is moisture or a wrong use of the code by me. No clue how to debug because it happens so randomly.
Can you precise the LoPy4 gets damaged beyond repair ? What does not work any longer, and why do you suspect the LoPy4 ist hardware damaged? Did you test another LoPy4 on the same expansion board? With how many LoPy4 boards did you run in this issue, a single board ore more?
So I finally got some more time to test this.
Running the same code on multiple new Expansion Board 3 and a new Lopy4, it ends in the kind of failure. They stop working at some random point (few hours / days) and the Lopy4 is broken beyond repair and in another case the Expansion Boars 3 is broken beyond repair. You can smell something is burned and in the case of the broken Expansion Board 3, you can clearly see that there has been overheating and burning of components around the power supply of the board. In the first place I suspected to be a short circuit happening due to moisture build-up, but even with treatment with conformal coating, the same thing happened. I now even tested the code on Lopy4 only (without the Expansion Board) directly powered over USB 5V (GND and VIN pins) --> and again after 2 hours the whole thing stopped working and the board felt overheated completely.
How can this happen? Is the lopy4 not capable of running this program with continuous scanning like this? Don't thinks so, but I have no clue how to debug this further whiteout damaging more and more LoPy4's.
What type of power supply do you use? Did you request the core temperature (rcommand 0x81) and have some values?
I am running it on a V88 Portable Laptop Battery: https://voltaicsystems.com/v88/
I did not request the device status, since it was broken already, so not able to do so. What is a normal / max temperature you would expect?
Starting a fresh one, I get the following status info:
battery: 0 memory: 4194000 reset0: 1 reset1: 14 temp: 59 uptime: 602
On our LoPy4 fleet, running on same expansion board, cased in Pycom case, installed indoors, we see CPU core temperatures at 40-50°C, which is fairly low. We have another installation in heated environment, running same software, but on a custom ESP32 board, not a LoPy4. Here we see core temperatures at 75°C. Both installations run > 100 days, and we never had heat issues or hardware damage.
We're not using power banks, but dumb usb chargers for power supply.
I would suggest that you either change your power supply to a dumb USB 5V charger (which i.e. does not have an USB charging intelligence) or you change the power bank using a model which provides a dumb 5V USB output. You can enforce this using a USB "charge only" cable which cuts off D+/D- lines.
The USB on the Pycom expansion board is driven by an MCU (PIC) with a pycom specific firmware, i don't know what this does, but maybe there is a USB handshake between PIC and your power bank which results to some kind of overvoltage issue?
59°C core temp @ 20-25°C room temp seems ok. But watch this value if it climbs in your application. If yes, you probably have a power supply problem.
Super strange this.... I have replaced the USB cable with only + and - now, so there should be no handshake anymore. Let's see what happens. Are you also running them in continuous WiFi scan mode like me? So no sleep?
I peeked into the link of your power bank. You are using a model which provides USB power delivery, what means the device can supply voltages > 5V: "Hi-Voltage Laptop Port (12V, 16V, 19V or 24V)"
Maybe this happenes - for some reason - if you connect it to a LoPy4.
What if you run any other kind of software on the LoPy4, while power setup is the same, did you test this?
--
And: Yes, our LoPy4 fleet is running continously 24/7 in wifi sniffing mode without sleep.
I am not using the High Voltage port, I am using the USB port, so this should not be the issue.
Are you willing to share your working code, so I can compare if there is some other issue?
The working code is the code of this repository, no differences.
PS: You may paste your lopy4.h and paxcounter.conf here, i will compare then.
PPS: I would suggest to address this issue at Pycom. I think it doesn't matter whether this issue is caused by running the paxcounter software (but i still don't expect this), the hardware design of LoPy4 should never allow any kind of software to burn the hardware. This would be a bad hardware design.
Edit: I posted the issue on pycom forum, requesting feedback.
lopy4.h `// clang-format off // upload_speed 921600 // board lopy4
// Hardware related definitions for Pycom LoPy4 Board
//#defin HAS_SPI 1 // comment out if device shall not send data via SPI // pin definitions for local wired SPI slave interface //#define SPI_MOSI GPIO_NUM_22 //#define SPI_MISO GPIO_NUM_33 //#define SPI_SCLK GPIO_NUM_26 //#define SPI_CS GPIO_NUM_36
// Note: Pins for LORA chip SPI interface come from board file pins_arduino.h
// select WIFI antenna (internal = onboard / external = u.fl socket)
// uncomment defines in this section ONLY if your LoPy lives on a PYTRACK BOARD // #define HAS_GPS 1 // #define GPS_I2C GPIO_NUM_25, GPIO_NUM_26 // SDA (P22), SCL (P21) // #define GPS_ADDR 0x10
// uncomment defines in this section ONLY if your LoPy lives on a EXPANSION BOARD //#define HAS_LED (12) // use if LoPy is on Expansion Board, this has a user LED //#define LED_ACTIVE_LOW 1 // use if LoPy is on Expansion Board, this has a user LED //#define HAS_BUTTON (13) // user button on expansion board //#define BUTTON_PULLUP 1 // Button need pullup instead of default pulldown //#define BAT_MEASURE_ADC ADC1_GPIO39_CHANNEL // battery probe GPIO pin -> ADC1_CHANNEL_7 //#define BAT_VOLTAGE_DIVIDER 2 // voltage divider 1MOhm/1MOhm -> expansion board 3.0 //#define BAT_VOLTAGE_DIVIDER 4 // voltage divider 115kOhm/56kOhm -> expansion board 2.0
paxcounter.conf
`// clang-format off
// ----- Paxcounter user config file ------ // // --> adapt to your needs and use case <-- // // Note: After editing, before "build", use "clean" button in PlatformIO!
// Verbose enables additional serial debug output
// Payload send cycle and encoding
// MAC sniffing parameters
// BLE scan parameters
// Corona Exposure Notification Service(ENS) counter
// set to 1 if you want to enable this function [default=0]
// for additional sensors (added by some user)
/ Note: guide for setting bluetooth parameters
// WiFi scan parameters
// LoRa payload default parameters
// Hardware settings
// Settings for BME680 environmental sensor
// OTA settings
// settings for syncing time of node with a time source (network / gps / rtc / timeserver)
// time zone, see https://github.com/JChristensen/Timezone/blob/master/examples/WorldClock/WorldClock.ino
// Ports on which the device sends and listenes on LoRaWAN and SPI
// Cayenne LPP Ports, see https://community.mydevices.com/t/cayenne-lpp-2-0/7510
// MQTT settings, only needed if MQTT is used (#define HAS_MQTT in board hal file)
//#define MQTT_CLIENTNAME "my_paxcounter" // generated by default `
@tjvandam i don't see any suspicious settings here, all fine. MAC_QUEUE_SIZE 100 ist unnecessary, 50 is far more than enough, even in dense environments - but 100 doesn't hurt. The current development version 2.5.x does not need MAC queuing any longer, since it comes with a far more efficient storage mechanism. It is stable already, will be released as master soon.
@tjvandam When you connected the LoPy4 directly, what did you power it with?
Note that even USB can provide much higher voltage than 5V. In the case of your power bank, USB QC can go up to 12V, USB C PD can go up to 20V (more generally USB QC can go up to 20V as well).
It should only happen if the device supports it (based on negotiation in theory), but there may be something wrong going on here with the power supply providing a lot more voltage than it should.
Try to measure the voltage provided, it will probably explain the failures.
Closed, since this is obviously a power supply issue, and not paxcounter related.
@tjvandam Please watch this thread on Pycom forum to solve your problem. I am sure it is an overvoltage caused by power supply issue. Please don't use your USB QC power bank any longer with your Pyom hardware, while we are awaiting Pycom's statement. Please use a dumb charger instead, to protect your hardware.
I am not sure why this is happening, but running the paxcounter on Pycom LoPy4 (in a Pycom Expansion Board 3) power over USB - the LoPy4 gets damaged beyond repair. It looks like an overheating issue to me, but hard to confirm as it was running for 2 weeks without issues on the first run and only 1 day on the second run.
It runs WiFi scan only in continuous mode. No sleep.