Datafable / epu-index

EPU index
http://www.applieddatamining.com/cms/?q=content/economic-policy-uncertainty-index
1 stars 0 forks source link

Endpoint for highest ranking article for a day #46

Closed peterdesmet closed 9 years ago

peterdesmet commented 9 years ago

Create an endpoint to retrieve the highest ranking article for a day. This will power the functionality described in #16. This functionality cannot be provided by the endpoint described in #25 which - if build - returns article information that is considered private and thus requires authentication.

If no (highest ranking) article is available for a certain day, an empty object should be returned (@bartaelterman: or what is the consensus here?)

URL

I propose: https://epu-index.herokuapp.com/api/highest-ranking-article

Options

format=json
date=yyyy-mm-dd (required)

Returns

{
    "article_title": "Pluto en zijn grootste maan Charon zoals je ze nog nooit zag",
    "article_url": "http://www.demorgen.be/wetenschap/pluto-en-zijn-grootste-maan-charon-zoals-je-ze-nog-nooit-zag-a2390841/",
    "article_newspaper": "De morgen"
}

date, epu and score could optionally be returned as well.

peterdesmet commented 9 years ago

@bartaelterman, please review.

peterdesmet commented 9 years ago

Updated return fields to use underscores to be more consistent with #45

bartaelterman commented 9 years ago

date seems redundant to return. epu and score actually mean the same thing. Still needs to be added to the model though. It is possible that we don't have the score for articles published before 2013 (see #52)

peterdesmet commented 9 years ago
  1. Do articles have scores? I thought they were ranked as positive/negative
  2. How far back in time are we able to show the highest ranking article?
bartaelterman commented 9 years ago
  1. Yes, articles have individual scores ranging from minus infinity to infinity. If the score is > 0, the article is positive, otherwise it is negative.
  2. I have a file with articles from march 1994 until december 17, 2013. These articles where scraped and scored with the previous version of the software, and apparently, the individual score for the articles was not saved (or at least, I don't have it). Note that this list contains only positive articles. I know that because for instance on January 8, 2000, the epu index was 0.5 (meaning 1 positive article out of 2 journals scraped). If I look at the articles I got, I indeed find only one article that day. I repeated this for a couple of other days and it seems to fit.

We will start scraping articles from december 17 2013 onwards.

We could score the old articles because we'll implement the scoring model anyway (see #51 ) but I am personally a bit wary about that (what happens if we come up with different results? We could spend a lot of time figuring out what went wrong). I marked #52 as a question, so I'll ask the user about this.

niconoe commented 9 years ago

Implemented ! Please test and report error or close the issue!

bartaelterman commented 9 years ago

Only one remark:

If I add two articles, one with epu score 18 and one with no epu score (so null) both published on the same day. Then when I request the highest article for that day, I get the one with the empty score. That should be the other one.

niconoe commented 9 years ago

Sorry, that was clearly a bug. This is now fixed, by considering EPU=Null as EPU=0.

Does that seems correct? Or should EPU=Null values totally excluded by this endpoint?

bartaelterman commented 9 years ago

This is fine.