Datafable / epu-index

EPU index
http://www.applieddatamining.com/cms/?q=content/economic-policy-uncertainty-index
1 stars 0 forks source link

Determine time zone of articles #42

Closed bartaelterman closed 9 years ago

bartaelterman commented 9 years ago

Hypothesis:

Every newspaper returns the date time in the time zone that it was published. Meaning, if an article was published in winter, the time zone is UTC+01 while if the article was published during the summer, the time zone is UTC+02.

I can test wether the returned time zone is indeed fixed:

Visit an article for each journal and note the published date. Reset your systems locale and visit the articles again. If the time zone of one of the articles changed, then that journal uses the users time zone to return timestamps. If not, the returned timezone is fixed.

However if the time zone is indeed fixed, I don't see a way to determine that time zone (is it UTC+01 or UTC+02 or - for some crazy reason - UTC?). @peterdesmet do you have an idea how we could determine the time zone?

peterdesmet commented 9 years ago

Why do we need to know the time zone of the article?

bartaelterman commented 9 years ago

How else should we store the "date_published"? We could make it a timestamp that is unaware of a timezone, but our app might include data for other countries in the future too. If we store all articles without time zone information, this could lead to confusing results.

bartaelterman commented 9 years ago

After a chat we decided to store the date times as verbatim (as is: without time zone information).

If journals do not return the time zone with their published articles, it is impossible to find an algorithm that will always find the correct time zone. (e.g. the night that the time zone changes to winter time, 3 A.M. becomes 2 A.M. so the time zone of an article with a published datetime that night at 2:30 A.M. cannot be deduced).

If we store the verbatim date time, the researchers can always try to find the correct time zone afterwards.

Should discuss this with the client (hence the "question" label).

peterdesmet commented 9 years ago

Related: in what time zone should the frontend show the dates? In what time zone is the API currently returning dates? I assume a "day" returned by the API should be the same "day" bucket in which the EPU is calculated and aggregated?

bartaelterman commented 9 years ago

Customer agrees with the proposed solution.

@peterdesmet I think the front end should do the same thing: return the datetime exactly how it was stored in the database (so timezone unaware).

peterdesmet commented 9 years ago

Frontend currently shows all dates as UTC:

Is this OK?

bartaelterman commented 9 years ago

This is not entirely correct. The input datetimes are timezone unaware, so it's basically impossible to show them as UTC. What we are actually doing is assuming they are all in the same time zone (whatever that may be), and the front end should show them in that time zone.

Everything else is ok.

bartaelterman commented 9 years ago

I should verify whether all date times are saved without time zone, because if this is inconsistent, it might lead to unexpected results.

bartaelterman commented 9 years ago

Had to update the standaard spider. Now all spiders return date time information without time zone.