esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
288 stars 34 forks source link

pulse_meter seems to have broken with 2023.8.0 #4807

Open egite opened 9 months ago

egite commented 9 months ago

The problem

Greetings.

Upgrading from 2023.7.1 to 8.0, the pulse_meter function seems to have broken/changed. The below now causes the counter to go crazy and increment at a high rate endlessly. Removing the internal_filter causes the initial count after reset to be a randomly high number. The pulse count afterwards counts, though is several orders of magnitude higher than actual. Removing the debounce filters also causes endless increments.

I have a separate ESP8266 on an identical but separate water meter running the same code that no longer reports its count. The sensor shows as "unknown".

Please let me know how I can help.

Which version of ESPHome has the issue?

2023.8.0

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2023.8.2

What platform are you using?

ESP8266

Board

No response

Component causing the issue

No response

Example YAML snippet

sensor:    
  - platform: pulse_meter
    pin: 
      number: GPIO5  #D1
      mode:  
        input: true
    name: "Pulse Meter"
    unit_of_measurement: "gal/min"
    internal_filter: 50ms   
    internal_filter_mode: EDGE
    accuracy_decimals: 1
    timeout: 10sec
    icon: "mdi:gauge"    
    filters: 
     - multiply: 0.0748052    
     - debounce: 50ms      
    total:
      name: "Total"
      accuracy_decimals: 3
      unit_of_measurement: "gal"
      device_class: water
      state_class: total_increasing
      filters: 
        - multiply: 0.0748052
        - debounce: 50ms

Anything in the logs that might be useful for us?

No response

Additional information

No response

Megu commented 9 months ago

Got fairly similar issue. "Unknown" value on ESP32 using 2023.8.0 and 2023.8.1 versions of ESPHome addon on Home Assistant 2023.8.2. No issues running 2023.7.1 version of ESPHome.


sensor:
    - platform: pulse_meter
      pin: GPIO19
      unit_of_measurement: 'kW'
      id: power1
      name: 'Power 1'
      device_class: "power"
      filters:
        - multiply: 0.06  # (60s/1000 pulses per kWh)
      internal_filter: $pulse_debounce
      total:
        id: electricity1
        name: Electricity 1
        unit_of_measurement: "kWh"
        accuracy_decimals: 3
        device_class: "energy"
        state_class: "total_increasing"
        filters:
          - multiply: 0.001
slallemand commented 9 months ago

Same issue as @Megu ; with version 2023.8 I have a "Unknown" value on my ESP8266. Reverted back to esphome 2023.7 and everything went back to normal and the pulse_meter is ok;

pmartyniuk commented 9 months ago

In addition, it is impossible to reset the counter using: pulse_meter.set_total_pulses I just tested this and resetting the counter again does not work

fabianschaefer commented 9 months ago

I also have another issue with pulse meter since the refactor. The "no pulse detected" code now always ignores the last pulse. In my case I have the timeout set to 10s. In the log snipped below, I get a pulse at 01:44:11 and another at 01:44:17, but at 01:44:21 (10 seconds after the penultimate pulse), I get the "No pulse detected for 10s message" and the sensor goes to 0, which is wrong:

[01:44:11][D][sensor:094]: 'water_meter_total': Sending state 38.00000 L with 0 decimals of accuracy
[01:44:15][D][pulse_meter:057]: No pulse detected for 10s, assuming 0 pulses/min
[01:44:15][D][sensor:094]: 'water_meter_flow': Sending state 0.00000 L/min with 0 decimals of accuracy
[01:44:17][D][sensor:094]: 'water_meter_total': Sending state 39.00000 L with 0 decimals of accuracy
[01:44:21][D][pulse_meter:057]: No pulse detected for 10s, assuming 0 pulses/min
[01:44:21][D][sensor:094]: 'water_meter_flow': Sending state 0.00000 L/min with 0 decimals of accuracy
[01:44:23][D][sensor:094]: 'water_meter_total': Sending state 40.00000 L with 0 decimals of accuracy
[01:44:27][D][pulse_meter:057]: No pulse detected for 10s, assuming 0 pulses/min
jmoery commented 9 months ago

I'm also having those issues on 2023.8.5. I'm using a pulse meter with a Shelly rgbw2 A/C input sensor to track switch presses. I get an unknown clause as an initial state and then the pulse meter doesn't reset on the timeout interval when pulses aren't detected.

fabianschaefer commented 9 months ago

@jesserockz can you have a look here, part of the pulse counter integration is broken since the refactor has been merged. Especially the "no pulse detected" mechanism doesn't behave as expected anymore (see my last comment).

CestLaGalere commented 9 months ago

problem having installed 2023.8.2 - power usage going from about 0.4kW to 170kW! The total has roughly tripled from about 20kWh / day to about 60kWh/day. No changes to yaml, system updates (core to 2023.8.3 & ESPHome to 2023.8.2)

  - platform: pulse_meter
    pin: GPIO13
    unit_of_measurement: 'kW'
    accuracy_decimals: 3
    name: 'Pulse Power'
    id: pulse_mtr
    filters:
      - multiply: 0.06  # (60s/1000 pulses per kWh)

    total:
      unit_of_measurement: 'kWh'
      name: 'pulse Total Energy'
      accuracy_decimals: 2
      filters:
        - multiply: 0.001  # (1/1000 pulses per kWh)
Grafton1046 commented 9 months ago

I also have another issue with pulse meter since the refactor. The "no pulse detected" code now always ignores the last pulse. In my case I have the timeout set to 10s. In the log snipped below, I get a pulse at 01:44:11 and another at 01:44:17, but at 01:44:21 (10 seconds after the penultimate pulse), I get the "No pulse detected for 10s message" and the sensor goes to 0, which is wrong:

The pulse meter I have setup to monitor wind speed doesn't seem to initially time out until after it's been triggered at least once, it will sit on unknown/NA status till it gets a single pulse, reports that value correctly, then times out as expected and reports 0

timn commented 8 months ago

I have the same issue. Accessing .state of the sensor gives nan. May I ask to rollback the latest change #4231 that apparently broke this? @TrentHouliston could you please take a look, or is there a way to fix forward?

The snippet that's breaking for me is this https://github.com/timn/esphome-meters/blob/main/gas/gas-meter-impuls.yaml.

I have confirmed that it works when downgrading to 2023.7.1:

docker run --rm -v "${PWD}":/config -it esphome/esphome:2023.7.1 run --device <ipaddr> <your_file.yaml>
Grafton1046 commented 8 months ago

Work around I've found is to publish a value for the sensor on boot.

TrentHouliston commented 8 months ago

Hi, I'll take a look sorry I didn't see this earlier until I was pinged.

So by the looks, there are several issues that people are having in this thread?

@egite The pulse counter is going crazy/reporting unknown

If the reporting unknown is only at the start before any pulses have happened, that's likely the same issue others are discussing with timeouts not starting until the first pulse has been received. As for the high pulse counts, do you have any other information or logs that could help me look into it?

@Megu @slallemand @jmoery The sensor is reporting unknown

Is this on startup before a single pulse has been detected?

@pmartyniuk resetting the total pulse count doesn't work?

This is surprising as I didn't even touch that code in my refactor, so it should still be working if it was working before. I'll look and see if I can see why it would have changed. I have been using this feature myself on the refactored version of the code for over 10 months now with no issues.

@fabianschaefer The timeouts are not working as expected

So based on your data the penultimate pulse is where the timeout is being measured from rather than the last actual pulse? I'll take a look but I'm struggling to see a mechanism in the code for that to happen.

@Grafton1046 @timn The pulse counter doesn't timeout on startup

What would be the preferred behaviour here? Just send a 0 value on boot or timeout after the timeout on boot and send 0 I opted not to send 0 when starting up as I didn't know the state of the pulse meter and therefore have no information about the current pulse rate. However, if the behaviour should be to timeout after timeout time has passed after startup without a pulse that can be fixed.

TrentHouliston commented 8 months ago

I have made a PR that will timeout the pulse meter on startup here esphome/esphome#5388

timn commented 8 months ago

I would not want a 0 to be sent. This is a total increasing counter, so a state of 0 would give wrong results (I have seen this for other sensors in HA and this would cause the entire total to be counted for each period (e.g., day). This is why in the config I linked I restore the value from flash on boot. Do you have a spare board and sensor to test that config?

TrentHouliston commented 8 months ago

I was referring to publishing 0 for the pulse rate sensor rather than the total sensor. I agree for the total sensor that posting 0 on startup would cause problems, especially if your intention is to restore the value from flash.

What was the behaviour that it had previously for you with the total sensor? As far as I can tell there should be no difference in behaviour for the output of the total sensor. The old code also just published a new update when it received a pulse and wouldn't have published anything prior to that.

fabianschaefer commented 8 months ago

@fabianschaefer The timeouts are not working as expected

@TrentHouliston So based on your data the penultimate pulse is where the timeout is being measured from rather than the last actual pulse? I'll take a look but I'm struggling to see a mechanism in the code for that to happen.

Yes that‘s correct. I‘ve now just doubled the timeout time in my config as a workaround, so it‘s not super urgent. Maybe the problem resolves itself when the root cause of some other issues here is fixed.

Goodwillson commented 8 months ago

I am getting also extreme sensor reads for wind speed after upgrading from 2023.5.5 to 2023.8.3. The wind speed goes up to 10000 km/h. I upped internal_filter to 100us, but I guess it may affect readings at higher wind speeds with such high limit. The extreme high wind speeds got filtered, but time to time I got speeds over 100km/h even at 100us. Is there any way to make the configuration compatible with new pulse meter?

- platform: pulse_meter
    pin:     
      number: GPIO25
      mode: INPUT_PULLUP
    id: wind_speed
    unit_of_measurement: 'm/s'
    name: "${friendly_name} wind speed"
    icon: 'mdi:weather-windy'
    internal_filter: 13us
    timeout: 5s
    filters:
      - multiply: 0.005560619
      - sliding_window_moving_average:
          window_size: 20
          send_every: 20
OBerghmans commented 8 months ago

I got similar results. Since 2023.8 I am measuring significantly more water usage via the config below. The water flow rate changed from around 10 liter/min to 130 liter/min (while my maximum possible in the house is 12 liter/min), and a daily usage from 150 liter to 1300 liter. I have to reset my Home Assistant utility meter for water every few days (subtracting about 3m³ each time).

I haven't played around with the parameters yet, but yesterday I compiled the 2023.7.1 firmware and flashed that to my board - as timn mentioned last week - as a temporary workaround.

  - platform: pulse_meter
    id: pulse_meter_water
    pin: ${ir_digital_pin}
    unit_of_measurement: "L/min"
    icon: "mdi:water"
    name: "Water Flow Rate"
    internal_filter: 150ms
    internal_filter_mode: PULSE
    accuracy_decimals: 3
    timeout: 30s
    total:
      name: "Water Usage"
      unit_of_measurement: "m³"
      icon: "mdi:water"
      device_class: water
      state_class: total_increasing
      accuracy_decimals: 3
      filters:
        - multiply: 0.001
OBerghmans commented 8 months ago

I still had big spikes with the new version (2023.9.1) so reverted again to 2023.7.1. I did run only a day and a half on the new version so I cannot say whether it is 'better' than previously, but after seeing spikes to 25 and 100l/min I decided to roll back. I can test if needed, if someone has some pointers and test scenarios.

TrentHouliston commented 8 months ago

Hi @OBerghmans I have a branch that logs every interrupt that occurs which is useful for working out what is going on. The only issue I had seen before this was related to noisy interrupt pulses that happened too fast for the ESP to see which resulted in lower than expected values rather than higher.

I merged in the latest changes but haven't had a chance to test it yet again myself. So let me know if this works and if you get any interrupt data that will help identify why it's giving you such a high rate.

external_components:
  - source: github://TrentHouliston/esphome@log_interrupts
    components: [ pulse_meter ]
OBerghmans commented 8 months ago

Thanks for the pointers @TrentHouliston ! I flashed the firmware, but I had to adjust the header file as I was getting compilation errors. A PR to your branch is created :-)

I will keep the logs open to see if I can get relevant interrupt data. Hopefully I get lucky because I have to keep it open in the browser (as my device is not connected via USB so I cannot check the serial connection). Or is there another way?

TrentHouliston commented 8 months ago

I've never tried this for longer periods myself, but if you have a computer you can run a command on and leave it you could run.

wget --user <web_user> --ask-password http://<your_device>.local/events

replacing and with whatever you configured as the web user in your device and with the device IP assuming you have the web_server component That just saves the log messages to a file, although the format isn't the nicest it's nothing a bit of cleaning can't fix

OBerghmans commented 7 months ago

Just a small update: I am still struggling with the amount of logs that it generates (one night is around 270mb). I hooked it up to a computer to save the logs that way. The ESP32 board also seems to reset regularly and I suspect it might be due to the logging.

My board is hooked up with a TCRT5000 sensor, which monitors my water meter: the meter has a small rotating disc which is half red/half "silver". The sensor returns high when it sees the silver part, and low when it sees the red part.

My sensor seems to sense the high and low quite good, but there's an area in between where it seems to flap between high and low rather fast. That gets solved by the internal_filter, but the logs get excessive. Here's a small excerpt, but I'm trying to log until I see such a huge spike.

[08:09:34][D][pulse_meter:046]: ISR: 4193319464 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193332279 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193335984 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193337766 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193351668 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193355420 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193357160 1 0 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193369661 0 1 0 1 1
[08:09:34][D][pulse_meter:046]: ISR: 4193371691 1 0 0 1 1

Long story short, still hunting after quality logs :-) I might adjust the cpp file to only log when the previous line was different, because as you can see the first 6 lines are the same (apart from the timestamp).

TrentHouliston commented 7 months ago

I know, the level of logging that branch does is excessive which is why I never actually put it into the main code even as a verbose log. And yes the pulse filter is very good at that kind of noise since the state needs to be stable for filter_length before it counts. If you do find a spike with logs that would be amazing to see.

Goodwillson commented 7 months ago

I included your logging component into my weather station, but ISR shows just zeros, despite I noticed about two or three huge spikes in wind speed. Filtered out whole log and there are only zeros logged everytime, but I see that pulse_meter component is taking long time to finish operation for some reason. I got that warning about 180x during the logging, but I noticed only about 2-3 spikes in wind data, so not sure if thats related.

id: 568830
event: log
data: [D][pulse_meter:038]: ISR: 568824689 0 0 0 0 0

id: 568836
event: log
data: [D][sensor:094]: 'METEO  wind pulses total': Sending state 4714.00000 pulses with 0 decimals of accuracy

event: state
data: {"id":"sensor-meteo__wind_pulses_total","value":4714,"state":"4714 pulses"}

id: 569006
event: log
data: [D][pulse_meter:038]: ISR: 568996517 0 0 0 0 0

id: 569011
event: log
data: [D][pulse_meter:038]: ISR: 568996549 0 0 0 0 0

id: 569024
event: log
data: [D][sensor:094]: 'METEO  wind pulses total': Sending state 4716.00000 pulses with 0 decimals of accuracy

event: state
data: {"id":"sensor-meteo__wind_pulses_total","value":4716,"state":"4716 pulses"}

id: 569033
event: log
data: [D][sensor:094]: 'METEO wind speed': Sending state 558.67969 m/s with 2 decimals of accuracy

id: 569033
event: log
data: [D][sensor:094]: 'METEO wind speed': Sending state 558.67969 m/s with 2 decimals of accuracy

id: 569049
event: log
data: [D][sensor:094]: 'METEO wind speed (km/h)': Sending state 2011.24683 km/h with 2 decimals of accuracy

id: 569052
event: log
data: [D][sensor:094]: 'METEO wind gust 60s': Sending state 2011.24683 km/h with 1 decimals of accuracy

event: state
data: {"id":"sensor-meteo_wind_gust_60s","value":2011.247,"state":"2011.2 km/h"}

id: 569069
event: log
data: [D][sensor:094]: 'METEO wind gust 10min': Sending state 2011.24683 km/h with 1 decimals of accuracy

event: state
data: {"id":"sensor-meteo_wind_gust_10min","value":2011.247,"state":"2011.2 km/h"}

event: state
data: {"id":"sensor-meteo_wind_gust_10min","value":2011.247,"state":"2011.2 km/h"}

event: state
data: {"id":"sensor-meteo_wind_speed_kmh","value":2011.247,"state":"2011.25 km/h"}

event: state
data: {"id":"sensor-meteo_wind_speed","value":558.6797,"state":"558.68 m/s"}

id: 569093
event: log
data: [W][component:204]: Component pulse_meter.sensor took a long time for an operation (0.09 s).

id: 569105
event: log
data: [W][component:205]: Components should block for at most 20-30ms.

id: 569304
event: log
data: [D][sensor:094]: 'METEO wind speed average': Sending state 281.53098 m/s with 2 decimals of accuracy

id: 569319
event: log
data: [D][sensor:094]: 'METEO wind speed average (km/h)': Sending state 1013.51147 km/h with 2 decimals of accuracy

My setup was following:

TrentHouliston commented 7 months ago

That's really odd, any time a value is published (that isn't a timeout) it should have a corresponding ISR state. It's possible that the pulse rate is too high and the ESP can't keep up with the logging.

Also all 0s in the ISR implies that both the current and previous value detected in the ISR (that's supposed to fire on change) are 0. So changing from 0 to 0... The only time I've seen that happen is when the signal is really noisy, and even then the pulse filter is coded to treat that kind of same value noise as a reset which should lower the rate

OBerghmans commented 7 months ago

I finally had some good readings this evening. From what I see in my logs I might just need to increase the internal_filter, but I will link the logs so maybe you can determine something else:

Hopefully that information can help, and I really appreciate your help with this!

elpirato83 commented 7 months ago

The problem

Greetings.

Upgrading from 2023.7.1 to 8.0, the pulse_meter function seems to have broken/changed. The below now causes the counter to go crazy and increment at a high rate endlessly. Removing the internal_filter causes the initial count after reset to be a randomly high number. The pulse count afterwards counts, though is several orders of magnitude higher than actual. Removing the debounce filters also causes endless increments.

I have a separate ESP8266 on an identical but separate water meter running the same code that no longer reports its count. The sensor shows as "unknown".

Please let me know how I can help.

Which version of ESPHome has the issue?

2023.8.0

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2023.8.2

What platform are you using?

ESP8266

Board

No response

Component causing the issue

No response

Example YAML snippet

sensor:    
  - platform: pulse_meter
    pin: 
      number: GPIO5  #D1
      mode:  
        input: true
    name: "Pulse Meter"
    unit_of_measurement: "gal/min"
    internal_filter: 50ms   
    internal_filter_mode: EDGE
    accuracy_decimals: 1
    timeout: 10sec
    icon: "mdi:gauge"    
    filters: 
     - multiply: 0.0748052    
     - debounce: 50ms      
    total:
      name: "Total"
      accuracy_decimals: 3
      unit_of_measurement: "gal"
      device_class: water
      state_class: total_increasing
      filters: 
        - multiply: 0.0748052
        - debounce: 50ms

Anything in the logs that might be useful for us?

No response

Additional information

No response

I had exactly the same issue on my esp32 water meter board. I had to add the code below to prevent the pulse meter going crazy. After upgrading to 2023.8 the Esp32 baord was picking up a lot of noise on the input pin which caused this.

pin: mode: input: true pullup: true

I used pullup because I have a NPN sensor on my water meter. If you use a PNP sensor then you have use pulldown: true.

CestLaGalere commented 7 months ago

@TrentHouliston - are you looking at this at all please? Your August refactor seems to have caused a number of issues which are still outstanding. Thanks

TrentHouliston commented 7 months ago

I am still looking into this as I get time however as I don't seem to get any of the issues described It's very hard for me to work out what is wrong.

From what I'm seeing it looks like it's the result of very noisy data which is what the filter is for and I'm trying to work out the mechanism that this noise is slipping through. The code that does the pulse filtering isn't that long so there aren't that many places something could happen. I'm wondering if some other change somewhere else changed something which is showing up as errors here, or if it's something to do with this new filtering code.

  // Get the current time before we do anything else so the measurements are consistent
  const uint32_t now = micros();
  const bool pin_val = sensor->isr_pin_.digital_read();

  // A pulse occurred faster than we can detect
  if (sensor->last_pin_val_ == pin_val) {
    // If we haven't reached the filter length yet we need to reset our last_intr_ to now
    // otherwise we can consider this noise as the "pulse" was certainly less than filter_us_
    if (now - sensor->last_intr_ < sensor->filter_us_) {
      sensor->last_intr_ = now;
    }
  } else {
    // Check if the last interrupt was long enough in the past
    if (now - sensor->last_intr_ > sensor->filter_us_) {
      // High pulse of filter length now falling (therefore last_intr_ was the rising edge)
      if (!sensor->in_pulse_ && sensor->last_pin_val_) {
        sensor->last_edge_candidate_us_ = sensor->last_intr_;
        sensor->in_pulse_ = true;
      }
      // Low pulse of filter length now rising (therefore last_intr_ was the falling edge)
      else if (sensor->in_pulse_ && !sensor->last_pin_val_) {
        sensor->set_->last_detected_edge_us_ = sensor->last_edge_candidate_us_;
        sensor->set_->count_++;
        sensor->in_pulse_ = false;
      }
    }

    sensor->last_intr_ = now;
    sensor->last_pin_val_ = pin_val;
  }
fabianschaefer commented 7 months ago

@TrentHouliston, if you have issues reproducing the issue, maybe it could be a start to look into the timeout issue that I mentioned above (https://github.com/esphome/issues/issues/4807#issuecomment-1685134880) This one is very easily reproducible and maybe the two issues are linked. So if you solve the root cause of the timeout issue, the other issue might be solved as well.

CestLaGalere commented 7 months ago

@TrentHouliston thanks - I am not sure it is just to do with the noisy data issue - I have a pulse monitor attached to a photo diode sensor board (as an example https://ardubotics.eu/en/sensors/1164-lm393-photodiode-sensor-module.html - Whilst I have not put an oscilloscope on the output as it is electronically generated it ought to be fairly clean! I have just rolled it back and can confirm that 2023.7.1 gives the correct output (in my case I am measuring light pulses from an electricity meter) and have a ct clamp at the moment which I can compare it to. Moving to 2023.8.0 or later the reading goes from about .5kW up to 170kW). The meter gives one light flash every second or two, so pulses of low frequency. If there is any debug info I can provide that might help please let me know.

felixstorm commented 7 months ago

@TrentHouliston First of all, thanks for taking the time and effort to clean up the old pulse_meter code. I am trying to understand your new code and potentially help out with at least some of the issues mentioned here but need to be sure on one question first: In mode PULSE, should a pulse be regarded as completed on a rising edge or on a falling edge, i.e. are we counting low pulses (completed when rising again) or high pulses (completed when falling again)?

The docs do not seem to really be precise on the expected behavior but in one place talk about a "high pulse". But in your current code the pulse gets counted on a rising edge, i.e. it looks like we would actually count low pulses (completed when signal is rising again) - that's why I am asking.

Also, when looking at the history of your PR, it seems like counting had been switched from falling edge to rising edge at some point on the way without that fact being mentioned in the comments thread: https://github.com/esphome/esphome/pull/4231/commits/80e565f2abb59df28018ef5283f89e6fe13c2fb1#diff-2421f768261b150494a9645d18fee16f9463486eb5c1f849b9f8f7ab8c96e283L107-R115

TrentHouliston commented 7 months ago

Hi @felixstorm that is one of the parts when writing the code that caused some confusion for me too during the review process. I tried to comment on it but I guess it wasn't clear enough.

The reason that the pulse gets counted on the rising edge is that is the first time that you can know that there has been a low signal of sufficient length so you're soft counting on a falling edge, then validating that falling edge on the next rising edge.

The pulse starts on a rising edge and then is completed on the following falling edge, however until we get another rising edge where we can see that the value was low for at least filter_width we can't say that the previous high to low was valid.

It's for filtering out the noise case where say you have a rising edge, then stay high for filter width, then have a very short low then high (noise), in that case since the low pulse wasn't filter_width it doesn't count as the "falling edge"

So the process goes like this

So the candidate that's published is the time of the previous rising edge, so it's counting the transition to a high.

Now that I think about it though, nothing is stopping you from publishing that candidate to the main loop as soon as you identify it. It's just the filter itself that needs to track if the low pulse exists to reset for the next rising edge. The only effect of delaying until the filter resets causes is a half-cycle delay before you get information

felixstorm commented 7 months ago

@TrentHouliston Thanks for your reply - that makes things a lot clearer for me.

IMHO the delay caused by having to wait until the next rising edge is something that‘s not desirable at all for sources with lower pulse count (e.g. water flow based on 1-liter-pulses from a traditional water meter) as it means that in reality the pulse_meter will always show the flow one interval late - in fact that was the exact reason why I started looking at the code details in the first place.
But in contrary to the delay_on_off filter (which can act right after a delay has passed without having to wait for another pulse at all) the current ISR-based solution only can act if the ISR gets triggered by another edge (which might be very late). As you say it might be worth trying to push that candidate to the main loop immediately and let the loop process it as soon as enough time has passed since the pulse has fallen without rising again…

Looking at the different reports here in detail it seems that most of the people with trouble are (implicitly) using EDGE mode (besides @OBerghmans - but if I am not mistaken he mentions that he should be able to fix his setup by incrementing the filter duration) - which uses a different ISR code path. I also tried edge mode and noticed unreal high results but immediately switched to pulse mode. I can try to debug into edge mode again a bit more, but maybe PULSE mode should become the default (after potentially changing it to count pulses sooner as discussed above) since it now is the better and more error-proof setting?

I will think about it a bit more and see if I can come up with some other helpful idea.

felixstorm commented 7 months ago

About ISR logging: In my tests I utilized a FreeRTOS queue for ISR logging and I should be able to implement it in a way that should have almost zero impact outside of VERBOSE or VERY_VERBOSE logging. I might bring this up as a PR for permanent inclusion in main line code so people can debug their setup more easily (without impacting other „normal“ users).

felixstorm commented 7 months ago

One more thing: PULSE mode often seems to discard the very first pulse after a restart (which messes up the counter). Setting last_pin_val_ to the opposite of the current pin state in setup() fixed that for me - I could bring this up as a PR as well…

TrentHouliston commented 7 months ago

About ISR logging: In my tests I utilized a FreeRTOS queue for ISR logging and I should be able to implement it in a way that should have almost zero impact outside of VERBOSE or VERY_VERBOSE logging. I might bring this up as a PR for permanent inclusion in main line code so people can debug their setup more easily (without impacting other „normal“ users).

If you can find a way to include it without making the code much more complicated that would be great. Then it would be easy to just turn it on with very verbose logging. The main reason I didn't include it originally was that I wanted to keep the ISRs as minimal as possible as how long those ISRs take to run directly impacts how well the filter can operate. If they take too long and pulses/noise start building up faster than the ISR can process that's when you can start getting weird behaviour.

One more thing: PULSE mode often seems to discard the very first pulse after a restart (which messes up the counter). Setting last_pinval to the opposite of the current pin state in setup() fixed that for me - I could bring this up as a PR as well…

That might be a good idea, it should help a lot of the time if the signal is clean enough. Make sure you set it as close as possible to where the ISR itself gets attached to limit the chance of the pin changing

Also if you are going to look at the pulse filter itself how I started was just listing all the options like in the code below and working out what to do in each, then reducing the logic down after.

Code ```cpp // This is an interrupt handler - we can't call any virtual method from this method // Get the current time before we do anything else so the measurements are consistent const uint32_t now = micros(); const bool pin_val = sensor->isr_pin_.digital_read(); const bool length_y = now - sensor->last_intr_ >= sensor->filter_us_; const bool length_n = !length_y; const bool pulse_y = sensor->in_pulse_; const bool pulse_n = !pulse_n; const bool change_y = sensor->last_pin_val_ != pin_val; const bool change_n = !change_y; const bool hi = pin_val; const bool lo = !pin_val; if (false) { } else if (pulse_y && len_y && change_y && hi) { // 1 1 1 1 } else if (pulse_y && len_y && change_y && lo) { // 1 1 1 0 } else if (pulse_y && len_y && change_n && hi) { // 1 1 0 1 } else if (pulse_y && len_y && change_n && lo) { // 1 1 0 0 } else if (pulse_y && len_n && change_y && hi) { // 1 0 1 1 } else if (pulse_y && len_n && change_y && lo) { // 1 0 1 0 } else if (pulse_y && len_n && change_n && hi) { // 1 0 0 1 } else if (pulse_y && len_n && change_n && lo) { // 1 0 0 0 } else if (pulse_n && len_y && change_y && hi) { // 0 1 1 1 } else if (pulse_n && len_y && change_y && lo) { // 0 1 1 0 } else if (pulse_n && len_y && change_n && hi) { // 0 1 0 1 } else if (pulse_n && len_y && change_n && lo) { // 0 1 0 0 } else if (pulse_n && len_n && change_y && hi) { // 0 0 1 1 } else if (pulse_n && len_n && change_y && lo) { // 0 0 1 0 } else if (pulse_n && len_n && change_n && hi) { // 0 0 0 1 } else if (pulse_n && len_n && change_n && lo) { // 0 0 0 0 } ```
OBerghmans commented 7 months ago

To follow up: I am indeed using PULSE mode. And I did increase the filter duration. That did not completely solve it. I just don't have any high water flow rate readings anymore (max 10 liter per minute), but it just keeps going up slowly in certain cases. For example: if I would believe the sensor, then we would have used 200l water during the night while everyone was in bed 😏

So I am still following along and will also check the code. Feel free to add me in PRs for reviewing.

OBerghmans commented 7 months ago

Just as some clarification and/or proposal to change the 3-deep nested if structure, I rewrote the pulse_intr method just for my understanding. So anyone reading this, this is my own snippet and not the real code:

    // Get the current time and pin value
    const uint32_t now = micros();
    const bool pin_val = sensor->isr_pin_.digital_read();

    // If the pin value hasn't changed and the pulse is too short, treat as noise and reset last_intr_
    if (sensor->last_pin_val_ == pin_val) {
        if (now - sensor->last_intr_ < sensor->filter_us_) {
            sensor->last_intr_ = now;
        }
        return; // Early exit for noise or a very fast pulse
    }

    // Handle rising and falling edges
    if (now - sensor->last_intr_ > sensor->filter_us_) {
        if (!sensor->in_pulse_ && sensor->last_pin_val_) {
            // Detected rising edge
            sensor->last_edge_candidate_us_ = sensor->last_intr_;
            sensor->in_pulse_ = true;
        } 
        else if (sensor->in_pulse_ && !sensor->last_pin_val_) {
            // Detected falling edge
            sensor->set_->last_detected_edge_us_ = sensor->last_edge_candidate_us_;
            sensor->set_->count_++;
            sensor->in_pulse_ = false;
        }
    }

    // Update last interrupt time and last pin value for changes
    sensor->last_intr_ = now;
    sensor->last_pin_val_ = pin_val;

@TrentHouliston Does this behave exactly the same or did I make a wrong assumption? The comments in the real code (also) did confuse me on what was considered the rising and falling edge. I did exit early in case the last_pin_val was the same as pin_val just so the remainder of the function is not as nested as it was.

TrentHouliston commented 7 months ago

https://godbolt.org/z/Yrq7zKzMr

Looks like it generates the exact same assembly, so the behaviour would be identical. I don't always like early exit codes as you're never sure where the function ends. Generally only put it right up the top for quick checks (which this would fall under) you would merge the two if statements though.

That said, trying to outsmart the compiler is never a good idea. Code that's easy to read for you and the compiler is better than clever code almost always :P

OBerghmans commented 7 months ago

Damn, how cool is that website?!

Anyway, I agree on the fact that early exits are confusing if they are all around. But - as you said - here it is a quick check to filter out fast pulses or noise so that the remainder of the function can focus on the "real task at hand". Merging the two if statements (without doing some other changes) would change behaviour though: we should run the rest of the code when last_pinval equals pin_val.

Apart from the changed complexity (although I personally like the early exit), we can also improve readability by finetuning the comments. Were those on point? I left out some parts to be more concise, which made it clearer for me. But perhaps I left out too much.

Perhaps I create a PR to your fork and we discuss inline there? Not forcing anything though, as making a PR does not have to mean really merging if we see that my proposed changes are no improvements.

CestLaGalere commented 7 months ago

I can confirm that changing the default setting in my case to

internal_filter_mode: PULSE

gives the expected output - I can try and get a trace of the input signal if that would help - the default value of EDGE breaks with the refactoring.

felixstorm commented 7 months ago

The pulse starts on a rising edge and then is completed on the following falling edge, however until we get another rising edge where we can see that the value was low for at least filter_width we can't say that the previous high to low was valid.

@TrentHouliston When thinking through it - why do we actually need to verify the following low period‘s length to be able to report the pulse? Since we anyway report the time of the first rising edge (and therefore do not really care about the pulse length), I would say that the pulse can safely be considered valid (and get reported to the loop) as soon as we receive any falling edge after filter_width has passed. If this falling edge was only short noise, the pulse might not yet be over (and we would need to deal with that inside the ISR), but IMHO that would not keep us from reporting it?

Now that I think about it though, nothing is stopping you from publishing that candidate to the main loop as soon as you identify it. It's just the filter itself that needs to track if the low pulse exists to reset for the next rising edge. The only effect of delaying until the filter resets causes is a half-cycle delay before you get information

And reading through your last paragraph, you probably meant to says the same thing with it 😔 - correct?

TrentHouliston commented 7 months ago

And reading through your last paragraph, you probably meant to says the same thing with it 😔 - correct?

Correct, I realised as I was reading the code there is no need for it. I should have written it to publish the moment I saw the falling edge. It had gone thorough a few iterations by that point and I must have missed that was an option.

@OBerghmans Yeah godbolt is really useful for trying to work out what code is doing. Helped me a lot when learning what types of things the compiler can optimise. At the end of the day if the assembly is the same it all comes down to readability and personal preference.

OBerghmans commented 7 months ago

@felixstorm Do you have your FreeRTOS queue code pushed somewhere? I'm interested so I could include it in my branch and further test my case.

felixstorm commented 7 months ago

I have not pushed it yet since I had been playing around also with the filter, but I should hopefully be able to push it tomorrow during the day (CET). It is not magic but a bit tricky since only ESP8266 and ESP32 are based on FreeRTOS but not the other cores supported by ESPHome and therefore it should only be enabled if FreeRTOS is available to avoid compilation errors on other platforms.

BTW - the ISRs on any edge did not seem to be really stable during my tests on an ESP32: I had situations where the pin never read 1 after an ISR and therefore no pulse got detected even though a button had been pressed for several seconds. It might have been that the signal was very noisy and the pin had gone back to 0 for a short moment again but since we can only sample the pin state right in the ISR with the current concept I have no real idea how to deal with those situations. Also it does not seem possible to attach separate ISRs at the same time to rising and falling edges either (to get the pin state based on the specific ISR that got called) :-(

felixstorm commented 7 months ago

@TrentHouliston @OBerghmans Ok, I just now found some time to push my work-in-progress with ISR debug logging (and initial setting of last_pin_val_ to not miss the first pulse): https://github.com/felixstorm/esphome/commits/dev-felixstorm

I will probably not be able to do much further work on this during the next few days, so please feel free to copy the code for a PR etc.

TrentHouliston commented 6 months ago

@fabianschaefer Just FYI with the help of https://github.com/esphome/esphome/pull/5813 I've realised what is likely causing your timeouts from the penultimate pulse problem. As I described in that PR I think there is a solution that ~I'll either try to find time to implement sometime~ or help with that PR to implement the fix.

I have thrown together some code to show the idea (I haven't even compiled it yet, will try testing it out later) https://github.com/esphome/esphome/compare/dev...TrentHouliston:esphome:asap_pulses

petewill commented 4 months ago

For the last couple of weeks, I have been reading through this and other issues/posts trying to find a fix for my pulse meter. I have a water meter (DAE VM-75P) that uses a dry contact sensor to send a pulse. The problem is the signal is noisy and the contact sensor stays high half of the time and low the other half. So, it's high 0-0.5 gal and low 0.5-1 gal. This means it could be high or low for hours if we're not using water. All that to say it doesn't pulse quickly even when we're using water. If we are using the maximum amount of water we could, the fastest it would pulse is about once every 2 seconds. I first tried setting the internal_filter_mode to Edge and that resulted in many more pulses than there actually were. When I set it to Pulse it doesn't pick up all my pulses. It lags behind about 9 pulses every 100 pulses. I tried with @hannesb's code mentioned in this issue https://github.com/esphome/esphome/pull/5813#issuecomment-1828743336 and I am getting similar results. Here is my code:

sensor:
  - platform: pulse_meter
    pin:
      number: 33
      inverted: true
      mode:
        input: true
        pullup: true
    name: "Water Usage Rate"
    id: water_meter
    device_class: water
    unit_of_measurement: "gal/min"
    icon: "mdi:water"
    internal_filter: 200ms
    internal_filter_mode: pulse
    timeout: 30 min
    accuracy_decimals: 4
    total:
      name: "Water Meter Total"
      unit_of_measurement: "gal"
      id: water_total
      device_class: water
      state_class: total_increasing
      accuracy_decimals: 0
      filters:
        - lambda: |-
            id(total_pulses) = id(water_total).state;
            //ESP_LOGI("main", "total_pulses value: %f", id(total_pulses));
            id(flash_led).turn_on();
            return x;

I have tried adjusting the "internal_filter" without any luck.

For reference, I have set up a test binary sensor and it worked flawlessly for the couple of days that I tested it. Here was my code for that:

binary_sensor:
  - platform: gpio
    pin:
      number: 33
      inverted: true
      mode:
        input: true
        pullup: true
    id: water_counter
    internal: true
    on_press:
      then:
        - lambda: |-
            id(total_pulses) += 1;
            ESP_LOGI("main", "total_pulses value: %f", id(total_pulses));
            id(flash_led).turn_on();
    filters:
      - delayed_on: 200ms
      - delayed_off: 200ms

Does anyone have any ideas for how I can improve my results? I tried a few external debounce methods (add a capacitor and resistors of various sizes) but that didn't help either.

TrentHouliston commented 4 months ago

I made a PR about 3 weeks ago to fix the timeout issue mentioned by https://github.com/esphome/esphome/pull/5813 that would help with some of the timeout issues that occur which might make it miss pulses events if they're quite slow.

https://github.com/esphome/esphome/pull/6014

Nobody has had a look at it yet and I have difficulty testing it because none of my sensors I have access to really warrant a pulse filter currently.

Although the total should still be correct even if due to timeouts there are less actual events so the fact you're missing 9/100 is still concerning. The only possible case I've seen implied by the other issues I've seen over time is that sometimes if the pluses change faster than the ISR it seems that it gets into weird states.

I just had an idea for something that might help so I'll make a quick branch off 5813 so you can test it if you'd like. https://github.com/TrentHouliston/esphome/tree/loop_pulse_isr That branch wouldn't be a solution that I'd actually implement though. Putting a loop into an ISR is icky. However if it solves or even helps your problem it implies that the ISRs/pin reads aren't syncing well at all