covid19datahub / COVID19

A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution
https://covid19datahub.io
GNU General Public License v3.0
251 stars 92 forks source link

Added Taiwan government data, only confirmed cases #116

Closed jonekeat closed 4 years ago

jonekeat commented 4 years ago

What type of PR is this?

Select all that apply: [ ] -> [x]

R

I confirm:

Additional comments

I have refer to the to-do-list and tried adding Taiwan local government data, but I found the url is not what we want (comfirmed, test,...). instead I found there are two relevant datasets which are: 1) Taiwan Latest COVID Cases, Tests Statistics & 2) Regional Age and Sex Statistics Table-Severe Special Infectious Pneumonia-Statistics by Case Study Day

This PR implements the 2nd dataset, which consists of confirmed cases by county, gender, imported & age_group. While the 1st dataset only consists of latest data without date column, therefore I am not sure whether to include or not, please advice on this, thanks.

eguidotti commented 4 years ago

Hello @jonekeat and thank you for your pull request. This is of great help. I fine tuned the code that is now almost ready to merge. I removed the filling of missing values as this is done automatically by the library. E.g.

x <- covid19("TWN") # raw data only
y <- covid19("TWN", raw = FALSE) # missing values are filled with the previous non-missing value 

I see that the dataset contains data for admin area level 2 (counties) and I added support for them. It would be of great help if you could update this file with the missing information, e.g. name of the counties in English, latitude, longitude, and population.

Regarding the other dataset, unfortunately we are not able to include it without the date column. Many thanks!

jonekeat commented 4 years ago

Hi @eguidotti ,

Thanks for reviewing my pull request. Regarding county level, I initially thought to include as well, but found there are majority of "空值" which means NULL values in county name (~88.63%), so if we want to include, do I put "-" or NULL in the field?

eguidotti commented 4 years ago

Hi @jonekeat, thank you for your message. I'm sorry, I didn't know "空值" means NULL. I dropped it from the R and csv files. The data left are very few as you say. On the other hand, this looks like the official data provider, so I'm afraid there is no much we can do about it. Let me know if you prefer to fill out the csv file to support level 2 or just drop it and merge this pull as is. Thanks!

jonekeat commented 4 years ago

Hi @eguidotti , No worries, I didnt realized its meaning initially (even I am quite confident with my chinese skill), and actually found out after googling it. I have filled in the english name of county, latitude & longitude, these data was just refer to Google. The population data was a bit outdated (mostly 2016), as refer to this website, please let me know if it looks good to you, XD

Btw please help me correct the original chinese county names, I have tried my best to save chinese char in csv in Windows, but...no luck i guess (probably utf-8 support things, TT)

eguidotti commented 4 years ago

Thanks @jonekeat, the file was perfect. I fixed and merged. You will find your name in the contributors section of the website and will receive the badge via email shortly. Thank you for your help! Don't hesitate to help further with new data sources