OxCGRT / covid-policy-tracker

Systematic dataset of Covid-19 policy, from Oxford University
https://www.bsg.ox.ac.uk/covidtracker
Other
763 stars 431 forks source link

Issues with C1 school closing for UK #56

Closed JiayaoL closed 3 years ago

JiayaoL commented 3 years ago

Hi,

Fantastic work!

I am using the covid-policy-tracker data and found there were some discrepancies for the C1 school closing for UK. For example, when C1_Flag = 1, C1_School Closing has two distinctive entries (e.g., 2 and 3) while other variables (ie. date) stays the same. It does not seem to come from different regions because there is no informaiton under "subregion". Do you know how to intrepret this and which value to take at the naitonal level? Thanks.

Best, Jiayao

saptahash commented 3 years ago

Hi @JiayaoL,

Thanks a lot for the question! I did just look through the UK national/subnational data, but I cannot find the instance you're referring to.

Would you be able to provide the RegionCode-Date combination where you observe this problem?

Thanks!

JiayaoL commented 3 years ago

Hi @saptahash,

Thanks you for looking into this. Apologies. I should have clarify it a bit more.

There is no information in the RegionCode for UK when I select the C1_Flag = 1 or C1_Flag is NA.

For example, if you look at the data for UK on date 2020-06-08, there are 1 record of C1_School.closing = 2, and 3 records of C1_School.closing = 3, when C1_Flag = 1 or C1_Flag is NA.

The data I have read in is oxcgrt <- read.csv(curl("https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv") )

Best, Jiayao

saptahash commented 3 years ago

Hi @JiayaoL,

Thanks for the detailed description! As again, I wasn't able to reproduce the problem but I think I know why you observe this. Can I confirm if RegionName and RegionCode appear as NA in your set-up?

My guess is that R is parsing the Region* columns as logical rather than character. To circumvent this, I use the following snippet in my code. Could you try this to see if it works?

url_oxcgrt <- "https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_latest.csv"

oxcgrtdata <- read_csv(url(url_oxcgrt), 
                       col_types = cols(RegionName = col_character(), RegionCode = col_character()))

Other ways to do this without readr : https://stackoverflow.com/questions/2805357/specifying-colclasses-in-the-read-csv

A snap of what the data looks like in my set-up: image

Update - Just ran the code you've used as well, I still seem to be getting the same result as the snapshot above.

JiayaoL commented 3 years ago

Hi @saptahash ,

Ah, I see! Exactly, both the RegionName and RegionCode appear as NA under my set-up.

Thanks for sharing the example and the snapshot, however, it still shows as NA by running your code. I can see there are actually region information included if I generate frequency table to see the content of RegionCode. I think the issue comes from my site, and I need to fix the set up of my program first.

Thank you for your help!

Best, Jiayao