DocNow / diffengine

track changes to the news, where news is anything with an RSS feed
MIT License
177 stars 30 forks source link

Dealing with url changes #40

Open pallih opened 7 years ago

pallih commented 7 years ago

I'm setting up a tracker for http://visir.is (a Icelandic news site).

I've noticed that when changes are done on headlines, their system makes new urls.

The urls are made up of these elements:

http://visir.is/g//< HEADLINE >

To view the article the < HEADLINE > part is reduntant.

To get around it I made some changes to allow for a regex to be applied to a url from the rss feed. See here:

https://github.com/pallih/diffengine/commit/0519f3114e0aebbb4c428152a1fa3894d6c9c769

This makes the url checked: http://visir.is/g/ so subsequent changes to the headline are picked up, and not stored as a new article.

I'm not sure introducing a config variable is appropriate for the project, but at least my solution is there, if anyone needs it.