danieldotnl / ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
MIT License
247 stars 14 forks source link

Scrape action should not be performed during startup when scan interval set to 0 #359

Open Paul-Vdp opened 2 months ago

Paul-Vdp commented 2 months ago

Title says it all. My scrape sensor relies on 'params' for some parts of the resource. Because these params have not (yet) been initialized on startup, the (unwanted and unneeded !) scrape results in errors.

danieldotnl commented 2 months ago

What should the value of the sensors be on startup in this case?

Paul-Vdp commented 2 months ago

To be honest : don't care, because I don't need nor use them at that moment - that's why their scan interval is set to 0 in the first place. Or maybe a more reasonable and acceptable answer would be : restored from their previous value, as with most other sensors ? As the running of the scrape on startup has got nothing to do with any need of these sensors to be refreshed/updated ...

SeanPM5 commented 2 months ago

Would also like this. I use the resource_template: option in Multiscrape to form some URL's using an attribute from another integration. But because Multiscrape loads faster (and attempts to scrape) before that other integration loads and has a sensor value, the template renders a broken URL that results in a bunch of 404 and 500 errors every time on startup.

IMO keep the default behavior as-is but introduce a new optional boolean like scrape_on_startup: false and that way it can work regardless of users scan interval. Sensor state could be unknown so user knows that Multiscrape integration is loaded but just didn't perform scrape yet.

Paul-Vdp commented 2 months ago

Glad somebody agrees with my point. Although I beg to differ with the suggestions, and stand behind my own, because : 1) setting scan-interval to 0 clearly is meant to indicate that one wants to perform the scraping on one's own tempo, if and when needed, under the sole control of the user and his automations. And therefore should NOT be 'externally' forced at startup. Any other interpretation does not make sense and therefore I see no need for an additional setting. 2) the same reasoning goes for the sensor values on startup. Restarting Hass is not in any way an objective reason to change the values of these sensors from their previous state - which therefore should be just retained. Or why would they have to be treated differently than e.g. the state of a light, or the state of a tempature sensor, etc ? I fail to see what influence Hass's restart could or would have on the content of the site we're scraping from, and therefore on the values we're scraping them for. And in the rather unlikely case of an extremely volatile site, one can always self-initiate a scrape on startup ...

danieldotnl commented 2 months ago

I agree with @Paul-Vdp and I will work on implementing this. It's not a small feature request though, so it will take some time.

Paul-Vdp commented 2 months ago

Much obliged @danieldotnl I realize it is not a simple change, but I am confident you will manage ;-)

Paul-Vdp commented 3 weeks ago

@danieldotnl Any progress made on this ? Just asking ;-)

saulleighton23 commented 3 days ago

scan_interval: 0 will be a really useful feature when implemented - I wholly support it.