Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
MIT License
293
stars
16
forks
source link
Multiscrape succeeds initially, fails on subsequent attempts with 'tag not found' error #399
I'm using the multiscrape integration in Home Assistant to extract the text value of a selector from the direct.playstation website. It works successfully when I initially start Home Assistant. However, after the specified scan_interval elapses and multiscrape attempts to check for a new or the same value, the entity becomes unavailable. The logs show the error message: 'Unable to scrape data: Could not find a tag for given selector'. Why does this error occur if the scraping works correctly once after restarting Home Assistant? How can I resolve this issue to ensure consistent scraping?
Debug log
2024-08-13 16:27:36.341 INFO (MainThread) [homeassistant.components.sensor] Setting up multiscrape.sensor
2024-08-13 16:27:36.342 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Setting up sensor
2024-08-13 16:27:36.343 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Start scraping to update sensor
2024-08-13 16:27:36.354 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Tag selected:
Momenteel niet beschikbaar
2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Selector result: Momenteel niet beschikbaar
2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Final selector value: Momenteel niet beschikbaar of type <class 'str'>
2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Selected: Momenteel niet beschikbaar
2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.entity] HA scraper # Joystickmodule # Updated sensor and attributes, now adding to HA
2024-08-13 16:37:36.593 DEBUG (MainThread) [custom_components.multiscrape.coordinator] HA scraper # New run: start (re)loading data from resource
2024-08-13 16:37:36.593 DEBUG (MainThread) [custom_components.multiscrape.http] HA scraper # Executing page-request with a GET to url: https://direct.playstation.com/nl-nl/buy-accessories/stick-module-for-dualsense-edge-wireless-controller with headers: {}.
2024-08-13 16:37:36.944 DEBUG (MainThread) [custom_components.multiscrape.http] HA scraper # Response status code received: 200
2024-08-13 16:37:36.944 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Loading the content in BeautifulSoup.
2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.coordinator] HA scraper # Data successfully refreshed. Sensors will now start scraping to update.
2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.357 seconds (success: True)
2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Start scraping to update sensor
2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Tag selected: None
2024-08-13 16:37:36.951 ERROR (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Unable to scrape data: Could not find a tag for given selector
Consider using debug logging and log_response for further investigation.
2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # On-error, set value to None
2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.entity] HA scraper # Joystickmodule # Sensor updated and state written to HA
Version of the custom_component
v7.0.3
Configuration
Describe the bug
I'm using the multiscrape integration in Home Assistant to extract the text value of a selector from the direct.playstation website. It works successfully when I initially start Home Assistant. However, after the specified scan_interval elapses and multiscrape attempts to check for a new or the same value, the entity becomes unavailable. The logs show the error message: 'Unable to scrape data: Could not find a tag for given selector'. Why does this error occur if the scraping works correctly once after restarting Home Assistant? How can I resolve this issue to ensure consistent scraping?
Debug log
2024-08-13 16:27:36.341 INFO (MainThread) [homeassistant.components.sensor] Setting up multiscrape.sensor 2024-08-13 16:27:36.342 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Setting up sensor 2024-08-13 16:27:36.343 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Start scraping to update sensor 2024-08-13 16:27:36.354 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Tag selected:
Momenteel niet beschikbaar
2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Selector result: Momenteel niet beschikbaar 2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Final selector value: Momenteel niet beschikbaar of type <class 'str'> 2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Selected: Momenteel niet beschikbaar 2024-08-13 16:27:36.355 DEBUG (MainThread) [custom_components.multiscrape.entity] HA scraper # Joystickmodule # Updated sensor and attributes, now adding to HA2024-08-13 16:37:36.593 DEBUG (MainThread) [custom_components.multiscrape.coordinator] HA scraper # New run: start (re)loading data from resource 2024-08-13 16:37:36.593 DEBUG (MainThread) [custom_components.multiscrape.http] HA scraper # Executing page-request with a GET to url: https://direct.playstation.com/nl-nl/buy-accessories/stick-module-for-dualsense-edge-wireless-controller with headers: {}. 2024-08-13 16:37:36.944 DEBUG (MainThread) [custom_components.multiscrape.http] HA scraper # Response status code received: 200 2024-08-13 16:37:36.944 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Loading the content in BeautifulSoup. 2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.coordinator] HA scraper # Data successfully refreshed. Sensors will now start scraping to update. 2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.357 seconds (success: True) 2024-08-13 16:37:36.950 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Start scraping to update sensor 2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.scraper] HA scraper # Joystickmodule # Tag selected: None 2024-08-13 16:37:36.951 ERROR (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # Unable to scrape data: Could not find a tag for given selector Consider using debug logging and log_response for further investigation. 2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.sensor] HA scraper # Joystickmodule # On-error, set value to None 2024-08-13 16:37:36.951 DEBUG (MainThread) [custom_components.multiscrape.entity] HA scraper # Joystickmodule # Sensor updated and state written to HA