beoutbreakprepared / nCoV2019

Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China
MIT License
658 stars 258 forks source link

Thanks for publishing the data, and some identified data quality issues (admin field, duplication) #43

Open steven4320555 opened 4 years ago

steven4320555 commented 4 years ago

Thanks for publishing the dataset, it creates a possible structure of looking at anonymised individual-level data. The methodology reads sound, but the quality of data can definitely improve over time.

For example, there are some links in admin columns.

image

And there are cases, same reference referring to different ID. (As a bonus, I found the original link used in the data has been updated to https://www.gov.uk/government/news/cmo-for-england-announces-4-new-cases-of-novel-coronavirus-2-march-2020 ) Instead of 2021 as referenced in the data.

image

Looking at the symptoms field, it seems to me that some data standardisation was attempted (for example 22, 23 ). Hope to get some consistency of descriptions.

image

Hope the quality of data can be improved over time. Good work!

smazrouee commented 4 years ago

I wonder how did they extract these symptoms and gave it it an ID with age, location, etc. I checked some of the publications, there is no specific information about individuals. I don't see anything under wiki tab. Is there a place that they explained how the collect/refine this data? thanks