dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.3k stars 965 forks source link

LD-JSON price follower causing `'list' object has no attribute 'lower'` on one site #1833

Closed Hexide closed 1 year ago

Hexide commented 1 year ago

Describe the bug Some website fail to scrape with error 'list' object has no attribute 'lower'

Version v0.44.0 and v0.45.2

Steps to reproduce the behavior:

  1. Add scrape for this test case: https://s3.hexide.com/public/2023/10/03/changedetectionio-testcase.html
  2. Wait for scrape to finish
  3. See error 'list' object has no attribute 'lower'

Additional context The issue appears to be caused due to application/ld+json object having list of string in @type key.

I was able to workaround the issue by patching: https://github.com/dgtlmoon/changedetection.io/blob/master/changedetectionio/html_tools.py#L195-L196

Replacing

if json_data.get('@type', False) and json_data.get('@type','').lower() == ensure_is_ldjson_info_type.lower() and stripped_text_from_html:
    break

to

break
dgtlmoon commented 1 year ago

You have a validation error also https://validator.schema.org/

SubType (The type SubType is not a type defined by the recognised schema (e.g. schema.org).)

Hexide commented 1 year ago

Yup, I completely understand that example I provided might not be valid according to spec. But it is a copy-paste example from a website (with data hidden) which I have no ability to change.

Expected outcome here would be that changedetection does it's best to gather data and not crash in situations where application/ld+json is invalid.

dgtlmoon commented 1 year ago

yeah agreed, i made the "detector" a little more specific about string/dictionary and wrapped it in try/except, we see if tests pass