esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
291 stars 34 forks source link

DS18b20 timing issues with multiple sensors #3980

Open messier433 opened 1 year ago

messier433 commented 1 year ago

The problem

Single wire communication to several DS18b20 temperature sensor are not working reliable and fail 90% of the time with CRC errors if 4 sensors are connected to one single bus. The same setup with a custom arduino firmware using pstolarz/OneWireNg^0.10.0 does not show the issue. The problem only started with connecting more than 1 sensor to the 1wire interface suggesting a change in timing due to added load on the bus. And indeed increasing the time_constant in the esphome function ESPOneWire::read_bit() in esp_one_wire.cpp to

uint32_t timing_constant = 20;//12;

fixed the issue.

In the comments it states: // note: for reading we'll need very accurate timing, as the // timing for the digital_read() is tight; according to the datasheet, // we should read at the end of 16µs starting from the bus low // typically, the ds18b20 pulls the line high after 11µs for a logical 1 // and 29µs for a logical 0

Therefore a time constant of 20us which is right in the middle of 11 and 29us seems to be the best choice with most margins to unknown bus delays. Edit: However the datasheet (https://www.analog.com/media/en/technical-documentation/data-sheets/ds18b20.pdf) defines: "Output data from the DS18B20 is valid for 15µs after the falling edge that initiated the read time slot. Therefore, the master must release the bus and then sample the bus state within 15µs from the start of the slot" Thus the stated 20us above are not compliant to the datasheet even though it helped for my specific problem!

A 10kOhm pull-up was used on the bus line. Probably decreasing the pull-up with more sensors connected, would also do the trick. However this was not tried since the setup is already installed in its final location with no quick access to the PCB

Update after a few days of testing: Still the sensor reading failed ocassionally (~1% of the time) with CRC errors. This also appeared now on a new board where I had the chance to decrease the pull-up resistor value on the bus line from 10k down to 4.7k and 3.3k. However decreasing the pull-up resistor value had no effect.

What seems to fix it now completly (after 1 day of testing) is to keep the interrupt lock in DallasTemperatureSensor::read_scratch_pad() during the complete function call (without release after the wire->reset()) See modified code below (this in addtion to the increased timing-constant shown above).

bool IRAM_ATTR DallasTemperatureSensor::read_scratch_pad() { auto *wire = this->parent_->one_wire_; { InterruptLock lock; ` if (!wire->reset()) { return false; } //} //{ //InterruptLock lock; wire->select(this->address_); wire->write8(DALLAS_COMMAND_READ_SCRATCH_PAD); for (unsigned char &i : this->scratchpad) { i = wire->read8(); } } ...`

Which version of ESPHome has the issue?

2022.12.3

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

No response

What platform are you using?

ESP32

Board

ESP32s2 - custom PCB

Component causing the issue

dallas

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

saschaludwig commented 1 year ago

Is this related to this issue? https://github.com/esphome/issues/issues/3909

JamieR007 commented 1 year ago

I have four DS18B20 sensors in a star configuration at the end of ~6m of CAT5e, using a 3.3kOhm pullup resistor. I started seeing "scratch pad checksum invalid!" errors after upgrading to 2022.12.3 from a much earlier version.

I tried different pullup resistor values, but to no avail.

The 20µs change suggested above seemed to provide no discernable improvement.

However, the change to the interrupt lock does indeed appear to fix the issue.

JamieR007 commented 1 year ago

Also just noticed, comparing the implemenation of DallasTemperatureSensor::read_scratch_pad() with some earlier builds (prior to upgrading to 2022.12.3) there was previously no interrupt lock at all. Otherwise the implementation looks identical.

craggyh commented 1 year ago

I too am having this issue with 8 sensors on 4 different Dallas hubs since updating to 2022.12.8, seeing scratchpad checksum invalid errors every 15-20 mins on random sensors. This is a setup that never produced an error in 2 years or running prior to the update.

What do I need to do to implement the interrupt lock fix above?

craggyh commented 1 year ago

Never mind, I re-read the thread and see I need to make the changes in esp_one_wire.cpp.

I’ll try it and see how I get on.

JamieR007 commented 12 months ago

Summary of the discussion in the related issue #4543:

I believe the root of the problem relates to the Interrupt Lock introduced in PR 3181. This lock is not maintained for the duration of the operation in read_scratch_pad, hence why multiple sensors (on the same hub) collide. My proposed solution is described above and in #4543.

I am happy to test this or any alternative solutions as my setup can readily reproduce.

MasterCATZ commented 11 months ago

So I guess I will stop looking for a hardware issue , system was running great for years reading all the fish tanks now getting crc scratch_pad messages being spammed over 100 sensors oddly it does not happen when running directly from a battery as a 5V power source ?

also seems like the esp keeps having to reboot its self

vdemidov commented 7 months ago

Strange behaviour. The WT32-ETH01 board has 4 ds18b20 sensors connected. When connected to web interface only, errors are very rare. But when also connected to ESPHome Wireless Logs, more than 50% of updates fail with error "Scratch pad checksum invalid!"