Open steven4320555 opened 4 years ago
I wonder how did they extract these symptoms and gave it it an ID with age, location, etc. I checked some of the publications, there is no specific information about individuals. I don't see anything under wiki tab. Is there a place that they explained how the collect/refine this data? thanks
Thanks for publishing the dataset, it creates a possible structure of looking at anonymised individual-level data. The methodology reads sound, but the quality of data can definitely improve over time.
For example, there are some links in admin columns.
And there are cases, same reference referring to different ID. (As a bonus, I found the original link used in the data has been updated to https://www.gov.uk/government/news/cmo-for-england-announces-4-new-cases-of-novel-coronavirus-2-march-2020 ) Instead of 2021 as referenced in the data.
Looking at the symptoms field, it seems to me that some data standardisation was attempted (for example 22, 23 ). Hope to get some consistency of descriptions.
Hope the quality of data can be improved over time. Good work!