custom-components / feedparser

📰 RSS Feed Integration
MIT License
135 stars 34 forks source link

No more images after update 0.1.8 -> 0.1.11 #78

Open BebeMischa opened 1 year ago

BebeMischa commented 1 year ago

Hello guys and girls,

I've just updated and now I have no more images in my feed. Something I need to change or is it a bug?

My feed sensor:

  - platform: feedparser
    name: Het nieuws
    feed_url: 'https://www.nu.nl/rss'
    date_format: '%a, %d %b %Y %H:%M:%S %Z'
    scan_interval:
      minutes: 1
    inclusions:
      - title
      - link
      - description
      - image
      - pubDate
    exclusions:
      - language

result:

afbeelding

Before this update it worked fine...

zboersen commented 1 year ago

For anybody facing the same issue, I changed the code in the sensor for feedparser. Now it should show both image en enclosure the right way! Maybe this can be added to the official release as well!

# Existing code

if "image" in self._inclusions and "image" not in entry_value.keys():
    images = []
    if "summary" in entry.keys():
        images = re.findall(r"<img.+?src=\"(.+?)\".+?>", entry["summary"])
    if images:
        entry_value["image"] = images[0]
    else:
        if "links" in entry.keys():
            images = re.findall(
                '(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', str(entry["links"][1])
            )
        if images:
            entry_value["image"] = images[0]
        else:
            entry_value[
                "image"
            ] = "https://www.home-assistant.io/images/favicon-192x192-full.png"
# Modified code

if "image" in self._inclusions and "image" not in entry_value.keys():
    images = []
    if "summary" in entry.keys():
        images = re.findall(r"<img.+?src=\"(.+?)\".+?>", entry["summary"])
    if images:
        entry_value["image"] = images[0]
    else:
        if "enclosures" in entry.keys() and entry["enclosures"]:
            enclosure_url = entry["enclosures"][0].get("url")
            if enclosure_url:
                entry_value["image"] = enclosure_url
        else:
            if "links" in entry.keys():
                images = re.findall(
                    '(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', str(entry["links"][1])
                )
            if images:
                entry_value["image"] = images[0]
            else:
                entry_value[
                    "image"
                ] = "https://www.home-assistant.io/images/favicon-192x192-full.png"
ogajduse commented 1 year ago

I am looking into it in my free time. I also plan to add tests for the integration in #80 to ensure that future releases do not break stuff. If anyone finds a way to fix it, do not hesitate to submit a PR.

BebeMischa commented 1 year ago

Thanks, @zboersen , it did the trick ;-)

ogajduse commented 1 year ago

I have the fix. I should publish a new feedparser version with the fix this weekend.

@BebeMischa @zboersen Could you please share the RSS feed URLs that you use and that contain images? That would help me in extending the test coverage.

ogajduse commented 1 year ago

81 should fix this issue.

Could you please try the beta release I did and tell me if images show up for you? https://github.com/custom-components/feedparser/releases/tag/0.2.0b0

If they do not show up, could you please provide the feed URL, so I can investigate?

Note: #78, #57 and #64 should be addressed and fixed by #81.