biglocalnews / covid-world-scraper

scrapers for the pitch map
ISC License
0 stars 0 forks source link

Assess time information for all countries #48

Closed zstumgoren closed 4 years ago

zstumgoren commented 4 years ago

For all countries that have scrapers (converted or not), we need to assess what if any time information we can gather about the data.

Ideally, we need to know the datetime and timezone of the data as reflected on the site. However, not all countries make this information clear. Korea is a case in point -- it only provides the hour, day and month and no clear indication if this is UTC or local time.

Before we update all of our scraper code to include dates, let's do an assessment of the countries on our list and add details on the date and time info they provide:

Dilcia19 commented 4 years ago

Brazil:

Updated: 25/06/2020 19:00. translated Atualizado em: 25/06/2020 19:00 Date stamp, no time zone or UTC indicator

Korea:

as of 12am 06.25. Date stamp, no time zone or UTC indicator

India:

as on : 25 June 2020, 08:00 IST (GMT+5:30) Date stamp, time zone/UTC indicator

Germany:

As of June 25, 2020, 00:00 a.m. (updated online at 08:30 a.m.) translated Stand: 25.6.2020, 00:00 Uhr (online aktualisiert um 08:30 Uhr) Date stamp, no time zone or UTC indicator

Nigeria:

Thursday 4:14 pm 25 Jun 2020 Date stamp, no time zone or UTC indicator

Pakistan:

25 Jun 2020 - 09:01 am (GMT+5) Date stamp, time zone/UTC indicator

South Africa:

2020-06-23T21:29:28+02:00Jun 23rd, 2020 Date stamp, time zone/UTC indicator

Dilcia19 commented 4 years ago

For PDFs

Indonesia:

Myanmar:

Russia:

Spain:

Dilcia19 commented 4 years ago

@zstumgoren See my findings in the two comments above. Most of the sites don't have a time zone. But almost all have some sort of time stamp.

zstumgoren commented 4 years ago

@Dilcia19 Ugh. Ok, I don't the source-provided dates are going to be reliable (i.e. publishable) given the ambiguity. Thanks for the quick turn-around. I'm going to send Pitch folks a note to this effect and will CC you and Cheryl.

Dilcia19 commented 4 years ago

Sounds good. Closing this out.

zstumgoren commented 4 years ago

@Dilcia19 Heads up that it appears South Africa provides local timezone info in the source code (the scraper is actually pulling and outputting this data). However, it seems like the visual display (in browser) only shows the month, day, and year without any clear timezone info. Just wanted to circle back to see if it's possible any other countries also use this strategy? It may be that perhaps Korea is the edge case?

zstumgoren commented 4 years ago

Oh crap, my bad. I totally see now that you mentioned South Africa and Pakistan and I must have missed those in my initial pass. Sorry for the confusion!