ViennaRSS / vienna-rss

Vienna is a free and open-source RSS/Atom newsreader for macOS.
https://www.vienna-rss.com
Apache License 2.0
1.82k stars 228 forks source link

Numerous duplicate posts in BBC feed #1745

Open slammer99uk opened 2 months ago

slammer99uk commented 2 months ago

I leave my iMac on overnight to download articles every 60 minutes. I have noticed recently (about 3 months) that there are continual duplicate or even triplicate posts in the BBC news feed window. As far as i am aware I changed nothing at my end.

Screenshot 2024-04-15 at 07 49 04

This si the RSS link - https://feeds.bbci.co.uk/news/world/rss.xml

Eitot commented 2 weeks ago

I can confirm this. I have checked the database and found that the articles are identical except for the GUID (globally unique identifier) that the feed itself provides. For instance:

SELECT title, message_id, link, date, revised_flag FROM messages WHERE title LIKE '%Belarus opposition leader warns%';
title message_id link date revised_flag
Belarus opposition leader warns Poland over borders https://www.bbc.com/news/articles/cldddvpgk90o#0 https://www.bbc.com/news/articles/cldddvpgk90o 1719161442.0 0
Belarus opposition leader warns Poland over borders https://www.bbc.com/news/articles/cldddvpgk90o#1 https://www.bbc.com/news/articles/cldddvpgk90o 1719161442.0 0

Note the final #0 and #1 of the message_id column. I have checked the source of the RSS feed and this is where those GUID values originate. According to the specification, the provided GUID value ought to be treated as a string as-is.

Eitot commented 2 weeks ago

I don't think that this is something that can or ought to be fixed by Vienna. The entries appear to be identical except for the GUID value. Even the body text is identical. There is no other way to distinguish those articles. I don't understand why the BBC would publish the same entries multiple times. I haven't found a source for this practice of using #<number> either.