bubelov / news

Feed Reader and Podcast Player for Android
https://f-droid.org/packages/co.appreactor.news/
GNU General Public License v3.0
407 stars 25 forks source link

Google News RSS feed shows duplicates #134

Closed slodown314 closed 2 years ago

slodown314 commented 2 years ago

Hi, I'm using the Google News RSS ain Standalone Mode. When I do a manual update fetch some feeds are shown twice or triple. Would be nice to add a feture to remote those dupicates. Thanky you!

bubelov commented 2 years ago

@slodown did you download it from Google Play?

slodown314 commented 2 years ago

No I'm using the one from F-Droid, didn't know there is an Google Play version...

slodown314 commented 2 years ago

https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de

bubelov commented 2 years ago

@slodown Google Play version was banned so it's better not to use it anyway. Google wants app developers to be able to censor content which is not compatible with many feed readers, including this one

bubelov commented 2 years ago

@slodown I tried to add your feed and I don't see any duplicates yet. Do they appear immediately or it can take time to manifest?

bubelov commented 2 years ago

@slodown disregard the previous message, I can see duplicates now

bubelov commented 2 years ago

Here is how this app creates an unique ID:

id = sha256("$feedId:$title:$description")

Most feeds work fine with that, because their articles don't change title or descripton most of the times. Google feed seem to update those fields pretty often and each change appears as a separate news entry

This feed supports guid though:

guid stands for globally unique identifier. It's a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new

I don't really remember why I don't use it in cases where it's available. I'll experiment with it and let you know when it's done

bubelov commented 2 years ago

It's not that simple, this feed is really odd...

First, it assigns an integer guid to it's stories, like:

<title>
Russia pushes war in Ukraine close to U.S. NATO allies' border with airstrikes near Poland as talks continue - CBS News
</title>
<guid isPermaLink="false">1333000611</guid>

Then, it magically becomes a string:

<title>
Russia pushes war in Ukraine close to U.S. NATO allies' border with airstrikes near Poland as talks continue - CBS News
</title>
<guid isPermaLink="false">
CAIiEPGlgb2A8k_SNFmNjAEI61UqGQgEKhAIACoHCAowyNj6CjDyiPICMJyFxQU
</guid>

It's the same story with the same title, but Google changes it's id every minute. This id is supposed to be constant, because every id is interpreted as a separate news item. I'm not sure if there are good ways to fix this in the app, this feed seems to be completely broken. Do you know any feed reader apps which don't show duplicates for this feed?

slodown314 commented 2 years ago

Thanks for your troubleshooting and review.

The "inoreader" app should have a feature which can handle this, but it was not working for google news, see this blog: https://www.inoreader.com/blog/2020/08/win-the-clone-wars-with-duplicate-filters.html

I'm using now as feed platform miniflux, before that i used nextcloud. I also tried atom feed: https://news.google.com/atom?hl=de&gl=DE&ceid=DE:de No success :(

bubelov commented 2 years ago

Closing because this feed breaks the Atom spec, there is nothing the app can reliably do to fix it