ImperialCollegeLondon / covid19model

Code for modelling estimated deaths and cases for COVID19.
MIT License
944 stars 271 forks source link

Bugs in data #84

Closed MansMeg closed 4 years ago

MansMeg commented 4 years ago

Describe the bug I have refactored a lot of the preprocessing and put most stuff into an R package for being able to add tests to data (I'm adding my own). While doing this I found three (smallish) bugs in the data:

  1. In d1 for spain, it is incorrectly so that Self_isolate_if_ill is set to the 14th of March, not the 17th as in the interventions.csv file.

  2. In d1 for Greece, the 3rd and 4th of March is missing, so the series is shifted by two days. This is padded for in the Deaths variable, but not in the Cases variable, and hence those two differ.

  3. Padding does not seem to work when adding additional countries. I have not delved deeper into this though. (I interpret padding as adding additional missing days with 0 cases and deaths)

To Reproduce Run the code and store d1 variables, check dates and compare with interventions.csv

Expected behavior The data should be the same as in ECDC data and interventions.csv

Additional context I have refactored most of the data processing and put this functionality as an R package. That lets me check package dependencies as well as include tests of the R code. Now I have test suites to asses this functionality to be correct (i.e. give identical results to your code minus the mentioned bugs). I could try to make a separate PR for this refactoring, if you are interested.

s-mishra commented 4 years ago

Hi @MansMeg , the spain is not a bug, for processing when an intervention happens after lockdown it is passed date of lockdown. Look we do this early in when we read covariates. Yes Greece is a bug in data I know about. Happy to look at PR without a doubt and chat there for other details.

MansMeg commented 4 years ago

Cool. I'll open up a PR later asap.

A short question, I do not really understand why Spain is not a bug? If it should be the 14th? Why is that not just specified in the interventions file? This difference is only for Spain?

s-mishra commented 4 years ago

Hi @MansMeg , it is not in interventions because there we use the real date the country has published. But to make sense of lockdown we assume all interventions that happen after lockdown to coincide with lockdown.

MansMeg commented 4 years ago

Ok. Thanks!