fossar / selfoss

multipurpose rss reader, live stream, mashup, aggregation web application
https://selfoss.aditu.de
GNU General Public License v3.0
2.36k stars 343 forks source link

Same News from Multiple Source? Any probable Solution? #939

Open bitvijays opened 7 years ago

bitvijays commented 7 years ago

Dear Selfoss Authors,

Thank you for creating such a awesome open-source tool. We were wondering if there a solution exists to identify same news from two different sources? For example: "FBI helping Qatar in Hacking Probe:Source" is mentioned by Security Week and the same news with almost same title is mentioned by "Daily Mail". Is there any probably solution for this?

We had a look at the Contentloader.php in helpers. However, currently, we currently check if any new item from the same source is in the database. We would be happy to contribute/ code if we have a decent solution.

Yours Sincerely,

jtojnar commented 7 years ago

This is one of use cases I was considering for plug-ins (#877).

How do you determine if an item is talking about the same news?

If you download the page to obtain source link, spouts might be a better place for the code, since some already obtain the whole page text (FullTextRss). You can add a getSimilarityMetadata method returning the required data. Then in the ContentLoader, you would compute the similarity and mark the items.