danieldotnl / ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
MIT License
247 stars 14 forks source link

Http request settings don't work for form_submit in v7.0.2 #355

Open sdrapha opened 2 months ago

sdrapha commented 2 months ago

Version of the custom_component 7.0.2

Configuration

This error originated from a custom integration.

Logger: custom_components.multiscrape.coordinator
Source: custom_components/multiscrape/coordinator.py:80
integration: Multiscrape scraping component (documentation, issues)
First occurred: 9:46:45 AM (6 occurrences)
Last logged: 10:03:31 AM

Scraper_noname_1 # Exception in form-submit feature. Will continue trying to scrape target page. [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for '#REDACTED#domain'. (_ssl.c:1000)
Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page. [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for '#REDACTED#domain'. (_ssl.c:1000)

Integration config:

Click to expand YAML ```yaml multiscrape: - resource: 'https://mocreo_hub.reverse_proxy.com/sensors' scan_interval: 300 verify_ssl: false form_submit: resource: 'https://mocreo_hub.reverse_proxy.com/login' select: 'body > div > div > div > div > div.card-body > form' input: passwd: !secret mocreo_password sensor: - unique_id: mocreo_sensor_1_serial name: Mocreo sensor_1_serial device_class: temperature state_class: measurement unit_of_measurement: '°F' select_list: 'div:has(>.card-header):-soup-contains("sensor_1_serial")>div>div>div' value_template: '{{ value.split(",")[2].split("°")[0] | trim | float }}' attributes: - name: model select_list: 'div:has(>.card-header):-soup-contains("sensor_1_serial")>div>div>div' value_template: '{{ value.split(",")[0] | trim }}' - name: serialnumber select_list: 'div:has(>.card-header):-soup-contains("sensor_1_serial")>div>div>div' value_template: '{{ value.split(",")[1].split(":")[1] | trim }}' - name: temperature select_list: 'div:has(>.card-header):-soup-contains("sensor_1_serial")>div>div>div' value_template: '{{ value.split(",")[2].split("°")[0] | trim }}' - name: unit_of_measurement select_list: 'div:has(>.card-header):-soup-contains("sensor_1_serial")>div>div>div' value_template: '°{{ value.split(",")[2].split("°")[1] | trim}}' - resource: 'https://mocreo_hub.reverse_proxy.com/nodes' scan_interval: 1000 verify_ssl: false form_submit: resource: 'https://mocreo_hub.reverse_proxy.com/login' select: 'body > div > div > div > div > div.card-body > form' input: passwd: !secret mocreo_password sensor: - unique_id: mocreo_sensor_1_serial_battery name: Mocreo sensor_1_serial battery device_class: battery state_class: measurement unit_of_measurement: '%' select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[3].split("%")[0] | trim | float }}' attributes: - name: type select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[0] | trim }}' - name: serialnumber select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[1] | trim }}' - name: battery select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[3] | trim }}' - name: signal select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[4] | trim }}' - name: online select_list: 'tr:-soup-contains("sensor_1_serial")>td>span.text-success' value_template: '{{ iif("" in value,"Online","Offline") }}' - name: last_seen select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[6] | trim }}' - name: version select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[7] | trim }}' - name: model select_list: 'tr:-soup-contains("sensor_1_serial")>td' value_template: '{{ value.split(",")[8] | trim }}' ```

Describe the bug

7.0.2 Broken, had to go back to 7.0.1 With 7.0.2, I'm not getting past the login page, the logged response says I didn't provide a valid password.

<div class="row">
  <div class="col-md-4 offset-md-4 col-12">
    <div class="card bg-light mb-3" style="">
      <div class="card-header">Hub Portal</div>
      <div class="card-body">
        <form action='/login' method='POST'>
          <div class="form-group ">
            <label for="un">User Name</label>
            <input name="user" id="un" class="form-control" value="admin" disabled>
            <input name="path" id="pa" class="form-control" value="/sensors" hidden>
          </div>
          <div class="form-group">
            <label for="passwd">Password</label>
            <input type="password" name="passwd" id="passwd" class="form-control" placeholder="Enter a password here" value="">
          </div>
          <button type="submit" class="btn btn-primary">Login</button>
          <div class="form-group"></div>
          <div class="alert alert-secondary" role="alert" style="font-size: 0.7em;">
          Tips: The password you entered is not your login password for <a href="https://portal.mocreo.com" target="_blank">MOCREO Portal</a>.
          </div>

Debug log


2024-04-06 09:51:51.793 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # New run: start (re)loading data from resource
2024-04-06 09:51:51.794 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Deleting logging files from previous run
2024-04-06 09:51:51.796 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Starting with form-submit
2024-04-06 09:51:51.797 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Requesting page with form from: https://redacted_domain/login
2024-04-06 09:51:51.797 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing form_page-request with a GET to url: https://redacted_domain/login with headers: {}.
2024-04-06 09:51:51.800 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # request_headers written to file: form_page_request_headers.txt
2024-04-06 09:51:51.802 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # request_body written to file: form_page_request_body.txt
2024-04-06 09:51:51.814 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Error executing GET request to url: https://redacted_domain/login.
 Error message:
 SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'redacted_domain'. (_ssl.c:1000)")
2024-04-06 09:51:51.815 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Unable to write headers and body to files during handling of exception.
 Error message:
 AttributeError("'NoneType' object has no attribute 'headers'")
2024-04-06 09:51:51.815 ERROR (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Exception in form-submit feature. Will continue trying to scrape target page.
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'redacted_domain'. (_ssl.c:1000)
2024-04-06 09:51:51.815 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Executing page-request with a GET to url: https://redacted_domain/sensors with headers: {}.
2024-04-06 09:51:51.819 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # request_headers written to file: page_request_headers.txt
2024-04-06 09:51:51.821 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # request_body written to file: page_request_body.txt
2024-04-06 09:51:53.992 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # Response status code received: 200
2024-04-06 09:51:53.995 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # response_headers written to file: page_response_headers.txt
2024-04-06 09:51:53.997 DEBUG (MainThread) [custom_components.multiscrape.http] Scraper_noname_0 # response_body written to file: page_response_body.txt
2024-04-06 09:51:53.997 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Loading the content in BeautifulSoup.
2024-04-06 09:51:54.004 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # page_soup written to file: page_soup.txt
2024-04-06 09:51:54.004 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Scraper_noname_0 # Data successfully refreshed. Sensors will now start scraping to update.
2024-04-06 09:51:54.004 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 2.210 seconds (success: True)
2024-04-06 09:51:54.004 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # Start scraping to update sensor
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # List selector selected tags: []
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # List selector csv: 
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # Applying value_template on selector result
2024-04-06 09:51:54.005 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: list object has no element 2 when rendering '{{ value.split(",")[2].split("°")[0] | trim | float }}'
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.form] Scraper_noname_0 # Exception occurred while scraping, will try to resubmit the form next interval.
2024-04-06 09:51:54.005 ERROR (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # Unable to scrape data: UndefinedError: list object has no element 2 
Consider using debug logging and log_response for further investigation.
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.sensor] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # On-error, set value to None
2024-04-06 09:51:54.005 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # Start scraping attributes
2024-04-06 09:51:54.006 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000# json # List selector selected tags: []
2024-04-06 09:51:54.006 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000# json # List selector csv: 
2024-04-06 09:51:54.006 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000# json # Applying value_template on selector result
2024-04-06 09:51:54.006 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: list object has no element 1 when rendering '{"serialnumber":"{{ value.split(',')[1].split(':')[1] | trim }}", "model":"{{ value.split(',')[0] | trim }}", "temperature":"{{ value.split(',')[2].split('°')[0] | trim }}", "unit_of_measurement":"°{{ value.split(',')[2].split('°')[1] | trim}}"}'
2024-04-06 09:51:54.006 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # json # Exception selecting attribute data: UndefinedError: list object has no element 1
2024-04-06 09:51:54.006 ERROR (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # json # Unable to extract data from HTML
2024-04-06 09:51:54.006 DEBUG (MainThread) [custom_components.multiscrape.entity] Scraper_noname_0 # Mocreo 0030AEA4005AD000 # json # On-error, set value to None
2024-04-06 09:51:54.007 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000# model # List selector selected tags: []
2024-04-06 09:51:54.007 DEBUG (MainThread) [custom_components.multiscrape.scraper] Scraper_noname_0 # Mocreo 0030AEA4005AD000# model # List selector csv: 
sdrapha commented 2 months ago

Actually, after spending time filling the bug and reading through the debug logs, I figured a workaround: to include verify_ssl: false inside the form_submit block

multiscrape:
  - resource: 'https://host/sensors'
    scan_interval: 300
    verify_ssl: false
    form_submit:
      verify_ssl: false

nonetheless, that was a breaking change, and it's also undocumented on the readme page

danieldotnl commented 2 months ago

You are right, that should have been mentioned and documented. I updated the release notes for those that still need to upgrade and will update the README accordingly. So your workaround is actually is a solution.