Open messier433 opened 1 year ago
Is this related to this issue? https://github.com/esphome/issues/issues/3909
I have four DS18B20 sensors in a star configuration at the end of ~6m of CAT5e, using a 3.3kOhm pullup resistor. I started seeing "scratch pad checksum invalid!" errors after upgrading to 2022.12.3 from a much earlier version.
I tried different pullup resistor values, but to no avail.
The 20µs change suggested above seemed to provide no discernable improvement.
However, the change to the interrupt lock does indeed appear to fix the issue.
Also just noticed, comparing the implemenation of DallasTemperatureSensor::read_scratch_pad() with some earlier builds (prior to upgrading to 2022.12.3) there was previously no interrupt lock at all. Otherwise the implementation looks identical.
I too am having this issue with 8 sensors on 4 different Dallas hubs since updating to 2022.12.8, seeing scratchpad checksum invalid errors every 15-20 mins on random sensors. This is a setup that never produced an error in 2 years or running prior to the update.
What do I need to do to implement the interrupt lock fix above?
Never mind, I re-read the thread and see I need to make the changes in esp_one_wire.cpp.
I’ll try it and see how I get on.
Summary of the discussion in the related issue #4543:
I believe the root of the problem relates to the Interrupt Lock introduced in PR 3181. This lock is not maintained for the duration of the operation in read_scratch_pad, hence why multiple sensors (on the same hub) collide. My proposed solution is described above and in #4543.
I am happy to test this or any alternative solutions as my setup can readily reproduce.
So I guess I will stop looking for a hardware issue , system was running great for years reading all the fish tanks now getting crc scratch_pad messages being spammed over 100 sensors oddly it does not happen when running directly from a battery as a 5V power source ?
also seems like the esp keeps having to reboot its self
Strange behaviour. The WT32-ETH01 board has 4 ds18b20 sensors connected. When connected to web interface only, errors are very rare. But when also connected to ESPHome Wireless Logs, more than 50% of updates fail with error "Scratch pad checksum invalid!"
The problem
Single wire communication to several DS18b20 temperature sensor are not working reliable and fail 90% of the time with CRC errors if 4 sensors are connected to one single bus. The same setup with a custom arduino firmware using pstolarz/OneWireNg^0.10.0 does not show the issue. The problem only started with connecting more than 1 sensor to the 1wire interface suggesting a change in timing due to added load on the bus. And indeed increasing the time_constant in the esphome function ESPOneWire::read_bit() in esp_one_wire.cpp to
uint32_t timing_constant = 20;//12;
fixed the issue.
In the comments it states:
// note: for reading we'll need very accurate timing, as the
// timing for the digital_read() is tight; according to the datasheet,
// we should read at the end of 16µs starting from the bus low
// typically, the ds18b20 pulls the line high after 11µs for a logical 1
// and 29µs for a logical 0
Therefore a time constant of 20us which is right in the middle of 11 and 29us seems to be the best choice with most margins to unknown bus delays. Edit: However the datasheet (https://www.analog.com/media/en/technical-documentation/data-sheets/ds18b20.pdf) defines: "Output data from the DS18B20 is valid for 15µs after the falling edge that initiated the read time slot. Therefore, the master must release the bus and then sample the bus state within 15µs from the start of the slot" Thus the stated 20us above are not compliant to the datasheet even though it helped for my specific problem!
A 10kOhm pull-up was used on the bus line. Probably decreasing the pull-up with more sensors connected, would also do the trick. However this was not tried since the setup is already installed in its final location with no quick access to the PCB
Update after a few days of testing: Still the sensor reading failed ocassionally (~1% of the time) with CRC errors. This also appeared now on a new board where I had the chance to decrease the pull-up resistor value on the bus line from 10k down to 4.7k and 3.3k. However decreasing the pull-up resistor value had no effect.
What seems to fix it now completly (after 1 day of testing) is to keep the interrupt lock in DallasTemperatureSensor::read_scratch_pad() during the complete function call (without release after the wire->reset()) See modified code below (this in addtion to the increased timing-constant shown above).
bool IRAM_ATTR DallasTemperatureSensor::read_scratch_pad() {
auto *wire = this->parent_->one_wire_;
{
InterruptLock lock;
`
if (!wire->reset()) {Which version of ESPHome has the issue?
2022.12.3
What type of installation are you using?
Home Assistant Add-on
Which version of Home Assistant has the issue?
No response
What platform are you using?
ESP32
Board
ESP32s2 - custom PCB
Component causing the issue
dallas
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response