commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Parse document created/updated dates #4

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

In many cases it would be an interesting info to show in the results.

There are many ways of getting this data from page or headers, with varying complexity and confidence. Let's investigate them!

Sentimentron commented 8 years ago

I had a look at this a long time ago, but, whilst detecting the dates is quite straightforward, segregating them into date published / date generated is quite tricky, especially if the article contains other dates. You've got several options:

sylvinus commented 8 years ago

Interesting links! The PDF seems to be down, is it this one? http://www2013.org/companion/p73.pdf

Sentimentron commented 8 years ago

That's it

On Thursday, 24 March 2016, Sylvain Zimmer notifications@github.com wrote:

Interesting links! The PDF seems to be down, is it this one? http://www2013.org/companion/p73.pdf

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/commonsearch/cosr-back/issues/4#issuecomment-201075735

Sent from my iPhone

sylvinus commented 8 years ago

I think we will end up using as many sources/methods as possible for this to cover all cases so it would be okay to begin implementing the simple ones!