diffbot / rss-anything

Transform any old website with a list of links into an RSS Feed
MIT License
49 stars 2 forks source link

Some Issues RSS Anything #4

Open ptrailblazer89 opened 2 days ago

ptrailblazer89 commented 2 days ago

Hello, Diffbot,

The Service Helps a Lot in providing RSS Feeds for websites which dont have one, but following problems were noted

  1. Blog Images arent Captured (https://politepol.com/en/ ...... does this well) which give insight into Article in a Glance https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/bg-p/blog-ai

  2. Not everything is Captured (and chronologically) https://www.upwork.com/nx/search/jobs/?q=embedded

  3. Sometimes Garbage Gets Captured along with feed (Sometimes nothing gets captured at all) (or title wont get captured correctly..Dates are put for Titles and Title as Text...https://www.quectel.com/blog/)

jeromechoo commented 2 days ago

Thanks for sharing these examples @ptrailblazer89.

  1. I've added image support to the feed. By the way, this site provides its own RSS feed that should be much faster but also lacks images. Might want to give that a shot.

  2. I see the issue here. The first entry (PCB Design and UI for a Canbus Motorsports Device) does not have a discrete date, so by default it is given today's date. "Last week" is too amorphous for Diffbot Extract to tell what day it is. I can't fix the date interpretation unfortunately, but this seems to only affect one of the entries.

  3. We'll review this in the base extraction model. Will report back when it's fixed.

ptrailblazer89 commented 2 days ago

Thanks for Quick Response ... Image Problem Still Persists

Feeder Android app unable to decode Images https://play.google.com/store/apps/details?id=com.nononsenseapps.feeder.play&hl=en_IN&pli=1 https://github.com/spacecowboy/Feeder



TEST SITE: https://blogs.blackberry.com/en/home

The Below is feed generated from: https://rss.diffbot.com/atom?url=https://blogs.blackberry.com/en/home

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom"><id>https://blogs.blackberry.com/en/home</id><title>BlackBerry Blogs</title><updated>2024-09-17T19:23:27.130771+00:00</updated><link href="https://rss.diffbot.com/rss?url=https://blogs.blackberry.com/en/home" rel="self"/><link href="https://blogs.blackberry.com/en/home" rel="alternate"/><generator uri="https://lkiesow.github.io/python-feedgen" version="1.0.0">python-feedgen</generator><icon>https://blogs.blackberry.com/etc.clientlibs/bb-spa-react/clientlibs/clientlib-react/resources/logo192.png</icon><subtitle>https://blogs.blackberry.com/en/home</subtitle>

<entry>
<id>https://blogs.blackberry.com/en/2024/09/top-multi-tenancy-console-cylance</id>
<title>Elevate Your IT Operations with the Updated Cylance Multi-Tenant Console</title>
<updated>2024-09-17T19:23:27.134618+00:00</updated>
<content>Announcing powerful updates we recently unveiled in the Cylance Multi-Tenant Console (MTC). It's the next step toward the future of IT Management.</content>
<link href="https://blogs.blackberry.com/en/2024/09/top-multi-tenancy-console-cylance"/>
<link href="https://images.blackberry.com/is/image/blackberry/multi-tenant-thumb-466x261?wid=466&amp;fmt=jpg"/>

<published>2024-09-13T00:00:00+00:00</published></entry>

<entry>
<id>https://blogs.blackberry.com/en/2024/09/memory-threat-detection</id>
<title>Detecting Threats in Memory: The Role of Advanced Sensors</title>
<updated>2024-09-17T19:23:27.134249+00:00</updated>
<content>Traditional methods often fail to detect memory-based cyberattacks. Advanced sensors that monitor and analyze memory are key to closing this gap.</content>
<link href="https://blogs.blackberry.com/en/2024/09/memory-threat-detection"/>
<link href="https://images.blackberry.com/is/image/blackberry/memory-attack-thumb-466x261?wid=466&amp;fmt=jpg"/>
<published>2024-09-12T00:00:00+00:00</published>
</entry>


The Below is feed from : https://phys.org/rss-feed/

<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>Phys.org - latest science and technology news stories</title>
<link>https://phys.org/</link>
<language>en-us</language>
<description>Phys.org internet news portal provides the latest news on science including: Physics, Nanotechnology, Life Sciences, Space Science, Earth Science, Environment, Health and Medicine.</description>

<item>
<title>Europa Clipper: 8 things to know about NASA's mission to an ocean moon of Jupiter</title>
<description>The first NASA spacecraft dedicated to studying an ocean world beyond Earth, Europa Clipper aims to find out whether the ice-encased moon Europa could be habitable.</description>
<link>https://phys.org/news/2024-09-europa-clipper-nasa-mission-ocean.html</link>
<category>Space Exploration Planetary Sciences </category>
<pubDate>Tue, 17 Sep 2024 15:26:04 EDT</pubDate>
<guid isPermaLink="false">news645805562</guid>
<media:thumbnail url="https://scx1.b-cdn.net/csz/news/tmb/2024/8-things-to-know-about.jpg" width="90" height="90"/>
</item>

<item>
<title>Lord Kelvin: How the 19th century scientist combined research and innovation to change the world</title>
<description>"What got you into astrophysics?" It's a question I'm often asked at outreach events, and I answer by pointing to my early passion for exploring the biggest questions about our universe. Well, along with seeing Star Wars at an impressionable age.</description>
<link>https://phys.org/news/2024-09-lord-kelvin-19th-century-scientist.html</link>
<category>General Physics </category>
<pubDate>Tue, 17 Sep 2024 15:23:04 EDT</pubDate>
<guid isPermaLink="false">news645805382</guid>
<media:thumbnail url="https://scx1.b-cdn.net/csz/news/tmb/2024/kelvin.jpg" width="90" height="90"/>
</item>


Looks Like Different Formating ???? Another Test URL: https://www.broadcom.com/blog