Open ptrailblazer89 opened 2 months ago
Thanks for sharing these examples @ptrailblazer89.
I've added image support to the feed. By the way, this site provides its own RSS feed that should be much faster but also lacks images. Might want to give that a shot.
I see the issue here. The first entry (PCB Design and UI for a Canbus Motorsports Device) does not have a discrete date, so by default it is given today's date. "Last week" is too amorphous for Diffbot Extract to tell what day it is. I can't fix the date interpretation unfortunately, but this seems to only affect one of the entries.
We'll review this in the base extraction model. Will report back when it's fixed.
Thanks for Quick Response ... Image Problem Still Persists
Feeder Android app unable to decode Images https://play.google.com/store/apps/details?id=com.nononsenseapps.feeder.play&hl=en_IN&pli=1 https://github.com/spacecowboy/Feeder
TEST SITE: https://blogs.blackberry.com/en/home
The Below is feed generated from: https://rss.diffbot.com/atom?url=https://blogs.blackberry.com/en/home
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom"><id>https://blogs.blackberry.com/en/home</id><title>BlackBerry Blogs</title><updated>2024-09-17T19:23:27.130771+00:00</updated><link href="https://rss.diffbot.com/rss?url=https://blogs.blackberry.com/en/home" rel="self"/><link href="https://blogs.blackberry.com/en/home" rel="alternate"/><generator uri="https://lkiesow.github.io/python-feedgen" version="1.0.0">python-feedgen</generator><icon>https://blogs.blackberry.com/etc.clientlibs/bb-spa-react/clientlibs/clientlib-react/resources/logo192.png</icon><subtitle>https://blogs.blackberry.com/en/home</subtitle>
<entry>
<id>https://blogs.blackberry.com/en/2024/09/top-multi-tenancy-console-cylance</id>
<title>Elevate Your IT Operations with the Updated Cylance Multi-Tenant Console</title>
<updated>2024-09-17T19:23:27.134618+00:00</updated>
<content>Announcing powerful updates we recently unveiled in the Cylance Multi-Tenant Console (MTC). It's the next step toward the future of IT Management.</content>
<link href="https://blogs.blackberry.com/en/2024/09/top-multi-tenancy-console-cylance"/>
<link href="https://images.blackberry.com/is/image/blackberry/multi-tenant-thumb-466x261?wid=466&fmt=jpg"/>
<published>2024-09-13T00:00:00+00:00</published></entry>
<entry>
<id>https://blogs.blackberry.com/en/2024/09/memory-threat-detection</id>
<title>Detecting Threats in Memory: The Role of Advanced Sensors</title>
<updated>2024-09-17T19:23:27.134249+00:00</updated>
<content>Traditional methods often fail to detect memory-based cyberattacks. Advanced sensors that monitor and analyze memory are key to closing this gap.</content>
<link href="https://blogs.blackberry.com/en/2024/09/memory-threat-detection"/>
<link href="https://images.blackberry.com/is/image/blackberry/memory-attack-thumb-466x261?wid=466&fmt=jpg"/>
<published>2024-09-12T00:00:00+00:00</published>
</entry>
The Below is feed from : https://phys.org/rss-feed/
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>Phys.org - latest science and technology news stories</title>
<link>https://phys.org/</link>
<language>en-us</language>
<description>Phys.org internet news portal provides the latest news on science including: Physics, Nanotechnology, Life Sciences, Space Science, Earth Science, Environment, Health and Medicine.</description>
<item>
<title>Europa Clipper: 8 things to know about NASA's mission to an ocean moon of Jupiter</title>
<description>The first NASA spacecraft dedicated to studying an ocean world beyond Earth, Europa Clipper aims to find out whether the ice-encased moon Europa could be habitable.</description>
<link>https://phys.org/news/2024-09-europa-clipper-nasa-mission-ocean.html</link>
<category>Space Exploration Planetary Sciences </category>
<pubDate>Tue, 17 Sep 2024 15:26:04 EDT</pubDate>
<guid isPermaLink="false">news645805562</guid>
<media:thumbnail url="https://scx1.b-cdn.net/csz/news/tmb/2024/8-things-to-know-about.jpg" width="90" height="90"/>
</item>
<item>
<title>Lord Kelvin: How the 19th century scientist combined research and innovation to change the world</title>
<description>"What got you into astrophysics?" It's a question I'm often asked at outreach events, and I answer by pointing to my early passion for exploring the biggest questions about our universe. Well, along with seeing Star Wars at an impressionable age.</description>
<link>https://phys.org/news/2024-09-lord-kelvin-19th-century-scientist.html</link>
<category>General Physics </category>
<pubDate>Tue, 17 Sep 2024 15:23:04 EDT</pubDate>
<guid isPermaLink="false">news645805382</guid>
<media:thumbnail url="https://scx1.b-cdn.net/csz/news/tmb/2024/kelvin.jpg" width="90" height="90"/>
</item>
Looks Like Different Formating
Also The Dates Seem Not to be captured Well : https://www.simplygon.com/blog
For Future Corrections Adding One More Error: (How Error has Creeped in Is Highlighted in Images) https://rss.diffbot.com/atom?url=https://news.broadcom.com/apj/category/technologies
Thanks for Quick Response ... Image Problem Still Persists
Looks like the
Also The Dates Seem Not to be captured Well : https://www.simplygon.com/blog
RSS Anything was not able to determine a date from this site because the dates are written in a format that makes the month or day ambiguous. It's something Diffbot is looking into with the extraction model but there's no immediate fix. In the mean time, RSS Anything does not include a date with these entries. It looks like your RSS reader is using a default (today) if they don't exist.
For Future Corrections Adding One More Error: (How Error has Creeped in Is Highlighted in Images)
Similar to dates, issue has to do with the extraction model. Error has been reported to Diffbot. No timeline for fix. Will update this issue when it is resolved.
Thanks for the Addition ... That Helps a Lot
Images were indeed added to :
but for rest images were not displayed
Blackberry Blog: https://rss.diffbot.com/atom?url=https://blogs.blackberry.com/en/home
Micron Blog: https://rss.diffbot.com/atom?url=https://www.micron.com/about/blog/company/insights
Broadcomm Blog
<entry>
<id>https://www.broadcom.com/blog/broadcom-innovation-in-engineering-artificial-intelligence-meets-software</id>
<title>Broadcom innovation in engineering: Artificial Intelligence meets software</title>
<updated>2024-09-23T20:36:09.657502+00:00</updated>
<content/>
<link href="https://www.broadcom.com/blog/broadcom-innovation-in-engineering-artificial-intelligence-meets-software"/>
<link href="https://www.broadcom.com/media/blt4ac44e0e6c6d8341/blt61fb45ad232605c1/66da1d7c025895664ec3e7e5/21772-mco-grphc-ai-universe-blog-image-GettyImages-1163715561-1920x455_v1_(1).jpg?width=374"/>
<published>2024-09-05T00:00:00+00:00</published>
<media:group>
<media:thumbnail url="https://www.broadcom.com/media/blt4ac44e0e6c6d8341/blt61fb45ad232605c1/66da1d7c025895664ec3e7e5/21772-mco-grphc-ai-universe-blog-image-GettyImages-1163715561-1920x455_v1_(1).jpg?width=374"/>
</media:group>
</entry>
Simplygon Blog
<entry>
<id>https://www.simplygon.com/posts/8b300a18-53e6-4ddc-aa71-b1badc322265</id>
<title>Optimize Unreal Engine levels with visibility culled Stand-Ins</title>
<updated>2024-09-23T21:04:25.357125+00:00</updated>
<content>This blog will cover how to use Stand-Ins in Unreal Engine to replace distant meshes with simple proxy meshes. We will use visibility culling to cull away any geometry not visible from the player's perspectiv</content>
<link href="https://www.simplygon.com/posts/8b300a18-53e6-4ddc-aa71-b1badc322265"/>
<link href="https://blogcontents.simplygon.com/media/baa3d8c0-b8dc-45bb-a96a-6b01ab889265/header.webp"/>
<media:group>
<media:thumbnail url="https://blogcontents.simplygon.com/media/baa3d8c0-b8dc-45bb-a96a-6b01ab889265/header.webp"/>
</media:group>
</entry>
Blackberry Blog
<entry>
<id>https://blogs.blackberry.com/en/2024/09/ten-cyberattack-types</id>
<title>10 Types of Cyberattacks Targeting Organizations Now</title>
<updated>2024-09-23T20:12:18.367119+00:00</updated>
<content>In this blog, we explore ten types of cyberthreats to organizations.</content>
<link href="https://blogs.blackberry.com/en/2024/09/ten-cyberattack-types"/>
<link href="https://images.blackberry.com/is/image/blackberry/top-10-attack-techniquesthumb-466x261?wid=466&fmt=jpg"/>
<published>2024-09-18T00:00:00+00:00</published>
<media:group>
<media:thumbnail url="https://images.blackberry.com/is/image/blackberry/top-10-attack-techniquesthumb-466x261?wid=466&fmt=jpg"/>
</media:group>
</entry>
micron Blog
<entry>
<id>https://www.micron.com/about/blog/company/insights/microns-commitment-to-meeting-customer-demand-and-operational-excellence</id>
<title>Micron's commitment to meeting customer demand and operational excellence</title>
<updated>2024-09-23T20:47:33.060316+00:00</updated>
<content/>
<link href="https://www.micron.com/about/blog/company/insights/microns-commitment-to-meeting-customer-demand-and-operational-excellence"/>
<link href="https://dmassets.micron.com/is/image/microntechnology/195025829-warehouse-robotic-arm%3A16-9-hero-banner-tertiary?ts=1726588157572&dpr=off"/>
<published>2024-09-23T00:00:00+00:00</published>
<media:group>
<media:thumbnail url="https://dmassets.micron.com/is/image/microntechnology/195025829-warehouse-robotic-arm%3A16-9-hero-banner-tertiary?ts=1726588157572&dpr=off"/>
</media:group>
</entry>
I think (.jpg,.webp) extensions at end of url could be problem (as below dont have them) for Feeder as Firefox is able to get images with those links..(Is there a solution)
As an update the rss feeds for the last two sites work fine with images displayed okay. Just the automatic extraction algorithm needs to be corrected for errors as you pointed out.
Thanks once again for the rss feed service as it is really very useful 👍.
ps:
All Authors for posts are 😱- Jerome Choo, Authors not extracted?
An option to pull out first image from article and put it in rss file for websites or some articles in a website which dont give one e.g. ieee publication list (https://ieeexplore.ieee.org/xpl/topAccessedArticles.jsp?punumber=12) can base extraction model be tweaked to do so (https://techxplore.com/, https://phys.org/ do this very well)
Some sites display more blog feeds if some action is performed say clicking button "more feeds", "scrolling down" rssdiffbot is unable to do this it only contends with the first few feeds
An option to get entries from first n pages
than from just the first page which could be very few
If date is not given/found (say internal to post as pointed out in previous issue) articles are displayed from bottom up (crawling from down?)
Hello, Diffbot,
The Service Helps a Lot in providing RSS Feeds for websites which dont have one, but following problems were noted
Blog Images arent Captured (https://politepol.com/en/ ...... does this well) which give insight into Article in a Glance https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/bg-p/blog-ai
Not everything is Captured (and chronologically) https://www.upwork.com/nx/search/jobs/?q=embedded
Sometimes Garbage Gets Captured along with feed (Sometimes nothing gets captured at all) (or title wont get captured correctly..Dates are put for Titles and Title as Text...https://www.quectel.com/blog/)