Closed polleke69 closed 2 years ago
Hey there @habbie, mind taking a look at this issue as it has been labeled with an integration (pm1006
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
I modded my Vindriktning, and I will monitor if I see the same behavior.
Which version of ESPHome has the issue?
2021.x
Can you be more specific than this?
At the moment i am running 2021.10.3, but all previous versions had the same issue (i bought the vindriktnings in september...) If i remember correctly the 2020 esphome versions had the pm1006 integrated. So lets go from there...
If I'm correct, your devices reboot about every 10 hours. My device is running with an uptime sensor now, so I'll check tomorrow to see if there have been any reboots. In case there are, the most common issue would be a memory leak in the firmware.
Well, it seems to be more eratic:
I agree, that doesn't look regular enough for a memory leak issue.
If it helps, some logging from serial:
[01:24:58] [01:24:58] ets Jan 8 2013,rst cause:4, boot mode:(3,7) [01:24:58] [01:24:58]wdt reset [01:24:58]load 0x4010f000, len 3584, room 16 [01:24:58]tail 0 [01:24:58]chksum 0xb0 [01:24:58]csum 0xb0 [01:24:58]v2843a5ac [01:24:58]~ld [01:24:58][I][logger:174]: Log initialized
This is a bit of a random guess, but all non-crashing configs I saw before did not have the filter. Can you try running one of them with pm1006 but without the filter?
I've got only one running here, and I added the filter to be sure to have the same setup as @polleke69. So far I haven't seen any crashes. 11 hours and counting.
The crashing one - does it report reliably while it is running? Looking at the existing code (I have a refactor sitting in a branch that I should really finish) I wonder how well it would deal with long strings of garbage. It should do fine, but it's not the greatest code.
Oh, I missed your serial output. In there, I see rst cause:4
.
That would mean that the WDT (watch dog timer) did reset the device AFAIK.
And that could be caused by some code locking up the device.
I don't see what part of the code that would be, so I'll keep running my setup.
To be sure: can you try another USB power supply, to rule out that the device isn't getting enough power right now? Underpowered devices can run into all kinds of issues.
I did not look up cause 4 but I also spotted wdt, which is why I wonder about garbage strings :)
That would have to be really large garbage strings to make those lock up your loop implementation.
Since the check for available()
bytes on the UART would have to be true until the WDT is triggered.
From what I see on my device using the UART debugger, it's small chunks once in a while though.
The reader could focus a bit more on finding 0x16, 0x11, 0x0B
in the serial data to find the start of the message and work from there (that's what's in your new branch?) That would re-align the reader with the output when the reader gets out of sync with the data from the sensor. But even when that goes wrong somehow because of garbage data, worst that could happen (if I read your code correctly, but I must admit it's beyond bed time for me) is missing data and log output filled with CRC mismatches.
Same time here I suspect! The loop
code -should- exit because check_byte_
triggers on the max message length as well, and I'd expect the watchdog to be happy then.
Do you happen to be updating any code using OTA? My device hangs when using UART and attempting to OTA update and then it will go into a reboot loop for quite awhile after that. If I flash it with a USB cable, it seems stable.
I don't think I'm seeing the same number of reboots that you are, but if you are looking for things to try, you might try removing ota:
A lot of suggestions ;-)
Tonight/this weekend i will give it a go without the filter and separately without ota: and keep u posted.
Normally i use ota: without any issues, but i also tried full erasing and flashing via serial with no difference in the reboot behaviour.
I never saw any issues with reporting of the units when pm1006 is active.
I already exchanged about 5 (different) power adapters, so i would say that rules out any power issues.
Removing ota: shouldn't be needed. THere can be ota issues, when software UART is used. My guess is that the interrupt handling for incoming data is interfering with the network stack, causing an onging OTA update to fail Normally, a few retries fixes this for me. That's why I created another ticket in which I propose to make sure that hardware UART is used for this device.
I'm afraid my device is still running, so I haven't been able to reproduce this yet.
Enabled pm1006 without filter on vindriktning1:
And rebooting again...
And guess what? Another day with no reboots here.
Can you maybe try soldering the Rx D2 on D7 instead and setting (next to the D2 to D7 pin change):
logger:
baud_rate: 0
That should give you hardware uart instead of software uart. That gives us one more component we can rule out (or in) as the culprit.
It seems the one running WITHOUT the filter is "a bit" more stable
So both are unstable ;-)
Right!
Next thing to try will be the hardware uart change... (somewhere this week)
Currently running 1 vindriktning on D2 and 1 running on D7 (hardware uart) both with filter.
Fingers crossed...
logger: baud_rate: 0
almost 2 days and counting...
And that is the D7 one I presume?
vindriktning1 = D7 vindriktning2 = D2
Now the question remaining is: why does the software uart causes these reboots?
Yes, especially because I have software UART myself and my two devices are running without hickups :-/ Could always be a hardware issue of course, so it would be useful to know if others are experiencing this behavior too.
what brand of esp's are u using?
Some D1 Mini clone from China, not branded.
Hmm, same here...
I think i'm just gonna change the other one as well and leave it at that...
Cool that it is stable this way. Uncool that we don't know why you get reboot. The balance tips to "cool" 😎👍
Yeah, but for the uncool part it would be nice to know why indeed...
What would be helpful in this case, is to get some insight in what the device logs during the reboots. If you would like to check this out, you could do the following:
If u scroll back a couple of posts, i posted some logging there. (if u need more, lemme know)
Yeah, that showed the WDT reset, so some component locking up the loop. So that was a start.
But when this happens, there should also be a stack trace right before that reboot message.
When using the esphome
command line tool for viewing the serial logging, it will decode the stack trace into code files + line numbers for you. When using an other tool, you'd get a list of hex addresses that would need manual decoding (for which you'd also nee the firmware.elf file that is created during the build).
Optimal result would be to get a hint about where exactly in the code the issue occurs.
Another thing that might be interesting, is to enable the UART debugger on the software UART, so see if the data comes in as expected, or if there's garbage on the line there. You could extend the uart config for vindrinktning2 as follows for this:
# Ikea Vindriktning PM2.5 sensor
uart:
rx_pin: D2
baud_rate: 9600
debug:
direction: RX
after:
bytes: 60
sequence:
- lambda: UARTDebug::log_hex(direction, bytes, ':');
This was all...
[05:37:59][D][pm1006:091]: Got PM2.5 Concentration: 326 µg/m³ [05:37:59][D][uart_debug:109]: <<< 16:11:0B:00:00:01:46:00:00:07:3A:00:00:01:7B:02:00:00:44:84 [05:38:01][D][sensor:113]: 'vindriktning2 WiFi Signal': Sending state -65.00000 dBm with 0 decimals of accuracy [05:38:01][D][pm1006:091]: Got PM2.5 Concentration: 327 µg/m³ [05:38:01][D][uart_debug:109]: <<< 16:11:0B:00:00:01:47:00:00:07:3B:00:00:01:7C:02:00:00:44:81 [05:38:12] [05:38:12] ets Jan 8 2013,rst cause:4, boot mode:(3,6) [05:38:12] [05:38:12]wdt reset [05:38:12]load 0x4010f000, len 3584, room 16 [05:38:12]tail 0 [05:38:12]chksum 0xb0 [05:38:12]csum 0xb0 [05:38:12]v2843a5ac [05:38:12]~ld [05:38:12][I][logger:214]: Log initialized [05:38:12][C][ota:452]: There have been 0 suspected unsuccessful boot attempts. [05:38:12][I][app:029]: Running through setup()... [05:38:12][C][uart.arduino_esp8266:052]: Setting up UART bus... [05:38:12][D][text_sensor:067]: 'vindriktning2 ESPHome': Sending state '2021.11.1 Nov 25 2021, 17:15:43'
Wow, that is quite silent indeed. The UART data looks okay. These are logged after the accompanying "Got PM2.5..." messages, because the debugger only logs the data after seeing no new data come in for a little while.
The data come in about once per minute looking at that output, so the next dat would arrive around 05:39. Way before that, the device reboots. So that makes it even more confusing, in case it is related so software UART, since at the time of the reboot, there is most likely no serial data coming in.
You can change bytes: 60
to bytes: 1
, which will cause the debugger to log every UART byte directly on a single log line. That way we would know for sure if there's really no data coming in, that is possibly tripping the device.
Not a lot of usefull info i guess...
[11:13:35][D][pm1006:091]: Got PM2.5 Concentration: 421 µg/m³ [11:13:37][D][uart_debug:109]: <<< 16 [11:13:37][D][uart_debug:109]: <<< 11 [11:13:37][D][uart_debug:109]: <<< 0B [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 01 [11:13:37][D][uart_debug:109]: <<< A5 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 08 [11:13:37][D][uart_debug:109]: <<< 6B [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 01 [11:13:37][D][uart_debug:109]: <<< D5 [11:13:37][D][uart_debug:109]: <<< 02 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 00 [11:13:37][D][uart_debug:109]: <<< 44 [11:13:37][D][uart_debug:109]: <<< 99 [11:13:37][D][pm1006:091]: Got PM2.5 Concentration: 421 µg/m³ [11:13:37][D][sensor:113]: 'vindriktning2 PM25': Sending state 418.89999 µg/m³ with 2 decimals of accuracy [11:13:58][D][uart_debug:109]: <<< 16 [11:13:58][D][uart_debug:109]: <<< 11 [11:13:58][D][uart_debug:109]: <<< 0B [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 01 [11:13:58][D][uart_debug:109]: <<< AB [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 08 [11:13:58][D][uart_debug:109]: <<< 7B [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 01 [11:13:58][D][uart_debug:109]: <<< DB [11:13:58][D][uart_debug:109]: <<< 02 [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 00 [11:13:58][D][uart_debug:109]: <<< 44 [11:13:58][D][uart_debug:109]: <<< 7D [11:13:58][D][pm1006:091]: Got PM2.5 Concentration: 427 µg/m³ [11:14:08] [11:14:08] ets Jan 8 2013,rst cause:4, boot mode:(3,6) [11:14:08] [11:14:08]wdt reset [11:14:08]load 0x4010f000, len 3584, room 16 [11:14:08]tail 0 [11:14:08]chksum 0xb0 [11:14:08]csum 0xb0 [11:14:08]v2843a5ac [11:14:08]~ld [11:14:08][I][logger:214]: Log initialized [11:14:08][C][ota:452]: There have been 0 suspected unsuccessful boot attempts. [11:14:08][I][app:029]: Running through setup()... [11:14:08][C][uart.arduino_esp8266:052]: Setting up UART bus... [11:14:08][D][text_sensor:067]: 'vindriktning2 ESPHome': Sending state '2021.11.1 Nov 27 2021, 09:10:48' [11:14:08][D][binary_sensor:034]: 'vindriktning2 status': Sending initial state OFF [11:14:08][C][wifi:037]: Setting up WiFi... [11:14:08][I][wifi:245]: WiFi Connecting to 'crap2'... [11:14:08][D][sensor:113]: 'vindriktning2 Uptime': Sending state 0.11800 s with 0 decimals of accuracy [11:14:08][D][uart_debug:109]: <<< 16 [11:14:08][D][uart_debug:109]: <<< 11 [11:14:08][D][uart_debug:109]: <<< 0B [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 01 [11:14:08][D][uart_debug:109]: <<< AF [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 08 [11:14:08][D][uart_debug:109]: <<< 8B [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 01 [11:14:08][D][uart_debug:109]: <<< DF [11:14:08][D][uart_debug:109]: <<< 02 [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 00 [11:14:08][D][uart_debug:109]: <<< 44 [11:14:08][D][uart_debug:109]: <<< 65 [11:14:08][D][pm1006:091]: Got PM2.5 Concentration: 431 µg/m³ [11:15:17][D][uart_debug:109]: <<< 00 [11:15:17][D][uart_debug:109]: <<< 00 [11:15:17][D][uart_debug:109]: <<< 44 [11:15:17][D][uart_debug:109]: <<< 57 [11:15:17][D][pm1006:091]: Got PM2.5 Concentration: 486 µg/m³
Well, useful in the sense that we now know that it's not incoming serial garbage data that is locking up the device. but you're right, this only thickens the mystery in a way ;)
Maybe it is not a software bug? The German magazine Make has connected a D1 mini with the IKEA board using two resistors (3k/1k) to take care of the voltage differences between the IKEA sensor voltage (4,5V) and the specified D1 Mini voltage of 3,3 V. The wiring in the HA community is directly connecting without a voltage adjustment. This would lead to higher than specified voltages at the serial input pin of the D1 and may cause the unstable UAT.
Got that, but how does that explain that connecting the sensor output to D7 (hardware uart) doesnt cause reboots?
As far as i know that would also operate at 3.3v then...
Also, the ESP8266 is supposedly 5V tolerant. Never found a 100% sure report on that, but it does seem true based on my experience. Also: my own Vindriktning is running on D2 with software UART, and that one doesn't show any issues at all. That doesn't support a general issue case either.
Conundrum I say.
Other than it rebooting sometimes, does it interrupt the reporting of data to HomeAssistant? I have the same configuration with a unbranded D1 mini with the same filter only I'm sending it every 7 on each parameter. I can't really say if it is rebooting as I haven't watched it that close, but I get consistent readings in HA so even if it was rebooting, it doesn't seem to be impacting my desired output.
I've got the filter at 30s so i don't miss a whole lot of data (probably nothing).
I'm just gonna connect to D7 and leave it at that...
Maybe that's the way to go. At least I am out of options to come up with tests to pinpoint the actual issue here. I'm willing to go with "something hardware yada yada", and be happy with the D7 solution for it. So feel free to close this issue when you reached the happy puppy stadium.
The problem
Hello, I am using 2 ikea vindriktning pm1006 sensors with esphome. The problem is that both units keep rebooting on a regular basis. In a period of about 8 days both units rebooted 20 (!) times.
Both are running the exact same esphome config. I tried disabling: logger, webserver, mDNS, AP, captive_portal but the issue remains.
Already hooked op 1 of the units on a serial line, but i didnt find any strange things in the logs...
Tried various different power adapters, no change.
At the moment 1 unit is running with the pm1006 sensor enabled, the other one without pm1006 sensor enabled. So far only the unit with the sensor enabled has rebooted 5 times...
To me this looks like a "bug" in the pm1006 sensor code...
Which version of ESPHome has the issue?
2021.x
What type of installation are you using?
pip
Which version of Home Assistant has the issue?
2021.x
What platform are you using?
ESP8266
Board
d1_mini
Component causing the issue
pm1006
Example YAML snippet
Anything in the logs that might be useful for us?
No response
Additional information
No response