v12.4.0 on Sonoff TH10 with SI7021 randomly reports false temperature and humidity values

lmagyar commented 1 year ago

PROBLEM DESCRIPTION

After update to v12.4.0 Sonoff devices started to randomly report invalid values. This seems to me some communication failure with the sensor. With previous versions (v9.x .. v12.3.1) I saw some WiFi disconnect issues (approx. once a day, automatic restart, not a serious problem), though I don't see this WiFi disconnects now, but I see false temperature values, so my guess is that maybe some WiFi communication issues/delays causes now some sensor communication issues instead of drop of WiFi connection.

I've tried reset 3 + power cycle, unplug-plug the sensors, multiple builds, no help. Multiple devices have this issue, though not all of them (currently). I will downgrade to v12.3.1 to see if the issue disappears.

This happens randomly, so I need help how to further investigate, eg. get Tasmota console log stored for 24h.

Tasmota_temp

Strange, that the reported values (not the averaged values on the above charts) are approx. the half of the real temperature or approx. minus half of the temperature, see the reported values from the database:

DB_temp_1 DB_temp_2

REQUESTED INFORMATION

[x] Read the Contributing Guide and Policy and the Code of Conduct
[x] Searched the problem in issues
[x] Searched the problem in discussions
[x] Searched the problem in the docs
[x] Searched the problem in the chat
[x] Device used (e.g., Sonoff Basic): Sonoff TH + SI7021
[x] Tasmota binary firmware version number used: 12.4.0(sensors)
- [x] Pre-compiled
- [x] Self-compiled
[ ] Flashing tools used: _____

[x] Provide the output of command: Backlog Template; Module; GPIO 255:

12:16:25.147 CMD: Backlog Template; Module; GPIO 255
12:16:25.172 MQT: stat/bedroom/RESULT = {"NAME":"Sonoff TH - inverted relay","GPIO":[32,1,1,1,1,0,0,0,256,320,1,0,0,0],"FLAG":0,"BASE":4}
12:16:25.401 MQT: stat/bedroom/RESULT = {"Module":{"0":"Sonoff TH - inverted relay"}}
12:16:25.655 MQT: stat/bedroom/RESULT = {"GPIO0":{"32":"Button1"},"GPIO1":{"0":"None"},"GPIO2":{"0":"None"},"GPIO3":{"0":"None"},"GPIO4":{"0":"None"},"GPIO5":{"0":"None"},"GPIO9":{"0":"None"},"GPIO10":{"0":"None"},"GPIO12":{"256":"Relay_i1"},"GPIO13":{"320":"Led_i1"},"GPIO14":{"1248":"SI7021"},"GPIO15":{"0":"None"},"GPIO16":{"0":"None"},"GPIO17":{"0":"None"}}

[x] If using rules, provide the output of this command: Backlog Rule1; Rule2; Rule3:

12:18:05.478 CMD: Backlog Rule1; Rule2; Rule3
12:18:05.501 MQT: stat/bedroom/RESULT = {"Rule1":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":121,"Free":390,"Rules":"ON Wifi#Disconnected DO RuleTimer1 900 ENDON ON Wifi#Connected DO RuleTimer1 0 ENDON ON Rules#Timer=1 DO Restart 99 ENDON"}}
12:18:05.731 MQT: stat/bedroom/RESULT = {"Rule2":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
12:18:05.934 MQT: stat/bedroom/RESULT = {"Rule3":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}

[x] Provide the output of this command: Status 0:

12:19:01.478 CMD: Status 0
12:19:01.486 MQT: stat/bedroom/STATUS = {"Status":{"Module":0,"DeviceName":"Bedroom","FriendlyName":["Bedroom Radiator"],"Topic":"bedroom","ButtonTopic":"0","Power":1,"PowerOnState":5,"LedState":0,"LedMask":"FFFF","SaveData":0,"SaveState":0,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0,"StatusRetain":0}}
12:19:01.504 MQT: stat/bedroom/STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota/release/tasmota.bin.gz","RestartReason":"Exception","Uptime":"0T00:15:14","StartupUTC":"2023-02-19T11:03:47","Sleep":50,"CfgHolder":4617,"BootCount":71,"BCResetTime":"2020-12-31T18:45:52","SaveCount":215,"SaveAddress":"FB000"}}
12:19:01.517 MQT: stat/bedroom/STATUS2 = {"StatusFWR":{"Version":"12.4.0(sensors)","BuildDateTime":"2023-02-16T16:52:18","Boot":31,"Core":"2_7_4_9","SDK":"2.2.2-dev(38a443e)","CpuFrequency":80,"Hardware":"ESP8266EX","CR":"422/699"}}
12:19:01.531 MQT: stat/bedroom/STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":2,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["Z2",""],"TelePeriod":10,"Resolution":"558180C0","SetOption":["00008008","2805C80001000600003C5A0A002800000000","00000080","00006000","00004000","00000000"]}}
12:19:01.555 MQT: stat/bedroom/STATUS4 = {"StatusMEM":{"ProgramSize":676,"Free":324,"Heap":20,"ProgramFlashSize":1024,"FlashSize":1024,"FlashChipId":"14605E","FlashFrequency":40,"FlashMode":"DOUT","Features":["00000809","8F8A8587","0405A005","B7F7BFCF","05DA9BC4","64367CC7","00084052","20000000","54000020","0000C081"],"Drivers":"1,2,3,4,5,6,8,9,10,12,14,16,17,24,29,34,62,65,66","Sensors":"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,22,26,31,34,37,39,40,42,43,45,51,52,55,56,58,59,62,64,66,67,74,98,103"}}
12:19:01.574 MQT: stat/bedroom/STATUS5 = {"StatusNET":{"Hostname":"bedroom","IPAddress":"192.168.1.185","Gateway":"192.168.1.1","Subnetmask":"255.255.255.0","DNSServer1":"80.244.99.37","DNSServer2":"80.244.99.36","Mac":"A4:CF:12:D7:CC:B1","Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}}
12:19:01.587 MQT: stat/bedroom/STATUS6 = {"StatusMQT":{"MqttHost":"192.168.1.10","MqttPort":1883,"MqttClientMask":"DVES_%12X","MqttClient":"DVES_A4CF12D7CCB1","MqttUser":"Device","MqttCount":1,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
12:19:01.604 MQT: stat/bedroom/STATUS7 = {"StatusTIM":{"UTC":"2023-02-19T11:19:01","Local":"2023-02-19T12:19:01","StartDST":"2023-03-26T02:00:00","EndDST":"2023-10-29T03:00:00","Timezone":99,"Sunrise":"07:51","Sunset":"18:16"}}
12:19:01.621 MQT: stat/bedroom/STATUS10 = {"StatusSNS":{"Time":"2023-02-19T12:19:01","SI7021":{"Temperature":24.8,"Humidity":46.0,"DewPoint":12.4},"TempUnit":"C"}}
12:19:01.634 MQT: stat/bedroom/STATUS11 = {"StatusSTS":{"Time":"2023-02-19T12:19:01","Uptime":"0T00:15:14","UptimeSec":914,"Heap":18,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":19,"MqttCount":1,"POWER":"ON","Wifi":{"AP":1,"SSId":"Z2","BSSId":"34:2C:C4:1B:98:80","Channel":1,"Mode":"11n","RSSI":32,"Signal":-84,"LinkCount":1,"Downtime":"0T00:00:04"}}}
12:19:01.658 MQT: stat/bedroom/STATUS12 = {"StatusSTK":{"Exception":29,"Reason":"Exception","EPC":["4025fbe0","00000000","401013f4"],"EXCVADDR":"00000000","DEPC":"00000000","CallChain":["40101eae","4024c16d","4024c4a9","4024c52e","4024c608","402580bf","402325d4","40258108","40232616","402326e4","4023271e","40258108","40236328","4029343e","4023757a","40101bef","40101e77","40101bef","40101e77","40293f1d","40237784","40243e42","4024c42a","4024c9f9","40258754","40228450","4025c54a","4025c5a8","401000e1","40228488","40258241"]}}

[ ] Set weblog to 4 and then, when you experience your issue, provide the output of the Console log:
```
Console output here:
```



### TO REPRODUCE
_Steps to reproduce the behavior:_

### EXPECTED BEHAVIOUR
_A clear and concise description of what you expected to happen._

### SCREENSHOTS

### ADDITIONAL CONTEXT
_Add any other context about the problem here._

**(Please, remember to close the issue when the problem has been addressed)**

Jason2866 commented 1 year ago

Please try the timing tweak option introduced lately. See issue #17944

lmagyar commented 1 year ago

OK, thank you, I will report a few days later the results.

lmagyar commented 1 year ago

TLDR:

In case of SI7021 DhtDelay 480,40 works much better than the default 500,30.

SI7021's initial LOW signal is only 60us wide instead of the 80us, and SI7021 is lazy to pull LOW the signal, and this 25us delay is part of the 60us, so there remains only 35us for the really LOW part of the LOW signal. With 40-42us delay we can target the middle of this really LOW signal.

Add rule to reinitialize these values after each restart (DhtDelay values are not stored in the configuration currently):

Rule1 ON System#Init DO DhtDelay 480,40 ENDON
Rule1 1

FYI:

I have 4 original SI7021 at hand now (exactly as in https://github.com/arendst/Tasmota/issues/17944#issuecomment-1429765337)
I'have read the source and #17944, https://github.com/letscontrolit/ESPEasy/issues/1798, https://github.com/arendst/Tasmota/issues/12180

Identified problem 1

In case of SI7021 the default 500us in DhtDelay 500,30 is too high, I experimented with it, and values 440..520 are OK, so 500us with some CPU delay maybe gets too long too often causing Pin14 timeout waiting for pulse 0. Maybe 480 as middle ground will work better.

Identified problem 2

In case of SI7021 the default 30us in DhtDelay 500,30 is too low, sometimes the wire voltage level is not decreased fast enough to LOW again after the initial 500us LOW signal, and (DhtExpectPulse(LOW) != UINT32_MAX) && (DhtExpectPulse(HIGH) != UINT32_MAX) returns (nearly) immediately and the read data is shifted with 1 bit, usually causing checksum failures, but it seems that sometimes the checksum can be OK, and it can explain why the invalid temperature values are approx. halved, because the bits are shifted.

In case of one of my SI7021 23us delay allways fails, 25us always works, 24us is sometimes works sometimes fails (additional spaces added by me):

DHT: Pin14 cycles (80/80)         58 27 58 32 58 33 58 32 58 33 58 32 58 32 58 110 58 101 58 102 ..
DHT: Pin14 read 01C300E6AA
DHT: Pin14 cycles (80/80)  49 105 58 32 58 33 58 32 59 32 58 32 59 32 58 32 58 109 59 101 ..
DHT: Pin14 checksum failure 80E1807355 =? 54

And why this happens even with the default 30us delay approx. once a day in burst, I don't know, maybe power issues caused by WiFi communication, I think we will never figure that out.

Deep dive into problem 2

Tested AM2301 for reference: it works correctly, even DhtDelay 2000,5 works (instead of the default DhtDelay 2000,50)

I've added the cycle count of the 80us+80us LOW+HIGH init pulses to the log. commit

Tested 4 approx. 2 years old SI7021:

The 80us LOW pulse is only 60us
SI7021 is lazy to pull LOW the line after the pullup HIGH, and this delay eats up the time from the only 60us LOW init signal
We need mininum 20-25us delay, so the really LOW part of the LOW signal is only 40-35us wide max.
With 40-42us delay we can target the middle of the LOW part of the LOW signal

DhtDelay 480,10

with additional log info: the cycle count of the LOW+HIGH init signals
1us = 1.4 cycle
intentionally too small delay

the initialization LOW+HIGH cycles are at the first bit's position, 56 + 104 cycles, 40us + 75us

18:04:13.404 DHT: Pin14 cycles (80/80) 0 12, 56 104 52 32 58 33 58 32 58 32 59 32 58 32 58 32 58 109 58 101 ..
18:04:13.406 DHT: Pin14 checksum failure 80E9007C65 =? E5
18:04:17.384 DHT: Pin14 cycles (80/80) 0 12, 56 104 52 32 59 32 58 32 58 33 58 32 58 32 59 31 58 109 59 101 ..
18:04:17.386 DHT: Pin14 checksum failure 80E9007C65 =? E5
18:04:21.371 DHT: Pin14 cycles (80/80) 0 13, 56 104 52 33 58 32 58 32 59 32 58 32 58 33 58 32 58 109 58 101 ..
18:04:21.373 DHT: Pin14 checksum failure 80E9007C65 =? E5
18:04:25.402 DHT: Pin14 cycles (80/80) 0 12, 56 105 52 32 58 32 58 33 58 32 58 32 59 32 58 32 58 109 58 101 ..
18:04:25.404 DHT: Pin14 checksum failure 80E9007C65 =? E5

DhtDelay 480,40

with additional log info: the cycle count of the LOW+HIGH init signals
1us = 1.4 cycle

we are targeting approx. the middle of the 56 cycle long LOW signal, reading 29 cycles from it

18:03:09.462 DHT: Pin14 cycles (80/80) 29 110, 58 27 52 32 58 32 58 33 58 32 58 32 59 31 59 108 59 101 58 102 ..
18:03:09.465 DHT: Pin14 read 01D200F8CB
18:03:10.391 WIF: Sending Gratuitous ARP
18:03:13.405 DHT: Pin14 cycles (80/80) 29 110, 59 26 52 32 58 32 59 32 58 32 58 33 58 32 58 109 58 101 58 102 ..
18:03:13.407 DHT: Pin14 read 01D200F8CB
18:03:13.649 WIF: Checking connection...
18:03:17.384 DHT: Pin14 cycles (80/80) 30 110, 59 26 52 32 58 32 59 32 58 32 58 33 58 31 59 109 58 101 58 102 ..
18:03:17.386 DHT: Pin14 read 01D200F8CB

Solution

Rule1 ON System#Init DO DhtDelay 480,40 ENDON
Rule1 1

I've set up SysLog 4, changed the config on 4 TH10/SI7021, and now I see Pin14 timeout waiting for pulse 0 approx. once in each hour, and Pin14 checksum failure ... approx. once in each 4 hours. And the errors are not in burst. So this seems to be working.

I will close this issue a few days later if it works stable.

On WiFi dropout

Downgrading to firmware v12.3.1 invalid sensor data issue disappeared but WiFi dropout issue re-emerged. So I think these 2 issues have some common root cause, the WiFi issue seems to be solved with some code change in some WiFi lib, and these default delay tweeks can solve the invalid sensor value issue.

Jason2866 commented 1 year ago

Closing, since the sensor is working with the introduced command for tuning timing of the sensor.

njh commented 1 year ago

Thank you for your work on this @lmagyar 🙂

I have been trying to get a Sonoff TH10 with SI7021 sensor working and have found it to be very unstable, particularly when trying to use a longer (5m) cable. I need to do some more testing and graphing but changing the timing setting to 480,40 does seem to have improved things.

Is there a reason to not update the default timing values for SI7021 to 480,40? https://github.com/arendst/Tasmota/blob/01bb287436d8c87bbeb54f30e48116fa8071d07e/tasmota/tasmota_xsns_sensor/xsns_06_dht_v7.ino#L44

Is the #else case after #ifdef ESP8266, for ESP32? Why are the timings different? Because the code executes faster?

SimonFili commented 1 year ago

Not sure if someone will see this, but I have a similar issue with an AM2302 (AM2301) on V13.2.

When I soft restart the Tasmota, the AM2301 is timing out on Pin 2 (configured with GPIO2).

But when I remove the +5V pin to the sensor and reconnect, it works fine... until a soft restart. See the logs going from timeout to reading when I "reset" the +5V

19:55:32.488 DHT: Pin2 timeout waiting for pulse 0
19:55:36.455 DHT: Pin2 cycles (0/80)  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ..
19:55:36.457 DHT: Pin2 timeout waiting for pulse 0
19:55:40.406 DHT: Pin2 cycles (80/80)  68 35 68 35 74 35 74 35 75 35 74 35 74 35 74 100 93 102 74 102 ..
19:55:40.408 DHT: Pin2 read 01E700CAB2

I have tried many dhtdelay. Unsure it's related.

Ideas?

lmagyar commented 1 year ago

I've restarted my Sonoff TH16 with AM2301, no problem.

The Pin2 cycles (0/80) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .. means it wasn't able to read anything. In my case, when there were timing issues, it was able to read "something", so there were non-zero values, usually shifted with 1 bit (due to the bad timing). So my guess is that constant zero means there is no communication at all. I think the pin configuration gets wrong somehow after the soft reset.

What is your hardware? ESP8266? Can you plug this sensor on to another device with different CPU (eg. ESP8266 vs ESP32)? I'm just guessing here.

SimonFili commented 1 year ago

Yes, it's on an ESP8266EX. There's this old discussion : https://github.com/arendst/Tasmota/discussions/15631 It's a similar issue. I'm wondering now if GPIO2 is used somehow at boot time.

UPDATE: I think this is the issue, this GPIO2 on this board is connected to an ESP-12F which is used as UART1_TXD.

At power-on, the AM2301 works At soft restart, it does not Unsure if this is solvable by a new firmware or it's in the CORE?

UPDATE2: I tried to put the AM2301 on GPIO3 (RXT), same issue, does not work after reset but OK after a disconnect/reconnect of +5V.

SimonFili commented 1 year ago

There's also this discussion: https://github.com/arendst/Tasmota/discussions/17181 That talks about a similar issue.

I'm wondering if a new firmware that would simply delay for 1 sec the activation of the AM2301 code on the configured pin would solve the issue?

lmagyar commented 12 months ago

Yes, GPIO 0, 1, 2, 3, 15, 16 better to avoid, I use GPIO 4, 5, 12, 13, 14 until I run out of pins, then I start to read the specs.

UPDATE: some links: https://randomnerdtutorials.com/esp8266-pinout-reference-gpios https://rabbithole.wwwdotorg.org/2017/03/28/esp8266-gpio.html

lmagyar commented 10 months ago

Yesterday I tested the new Sonoff TH Origin THR316, which uses ESP32-D0WD-V3 chip and a modified SI7021 called THS01, and it is unstable with the default DhtDelay settings: after leaving it alone for 10-20 minutes (not using it's UI), it starts to read null temperature & humidity values intermittently.

Experimented with it, and the default DhtDelay 400, 30 values are wrong again.

The 400ms LOW pulse going out to the sensor is extremely low, with 390 it fails 100% of the time, with 395 fails 50%, with 400 fails after a few minutes, so I started to use my solution above, the 480. It works flawlessly in the past day.

Good news is that this THS01 is really fast to reply with a LOW signal, even with single digit or even 0 wait value instead of the default 30 works. It starts to fail with 80, so my previous solution for this (40) also works (at least do not break anything).

So in case of SI7021 or THS01 and ESP8266 or ESP32, the below values and workaround are still better than the current default:

Rule1 ON System#Init DO DhtDelay 480,40 ENDON
Rule1 1

With the default 400,30:

With 480, 40:

arendst / Tasmota