epiforecasts / covidregionaldata

An interface to subnational and national level COVID-19 data. For all countries supported, this includes a daily time-series of cases. Wherever available we also provide data on deaths, hospitalisations, and tests. National level data is also supported using a range of data sources as well as linelist data and links to intervention data sets.
https://epiforecasts.io/covidregionaldata/
Other
37 stars 18 forks source link

Cases greater than deaths #309

Closed joseph-palmer closed 3 years ago

joseph-palmer commented 3 years ago

This PR adds tests that cases are greater than deaths for each region.

When using the stored data in custom_data, the tests fail for:

Everything else passes, looks like slicing the data causes issues with these countries.

On the full downloaded data (not just the first few rows) all pass except for Brazil level 2. The problem here is cedro do Abaete which has 12 cases and 47 deaths. Is this possible or a flaw in the data? If Brazil is a known problem we can write an exception for this and inform users somehow.

Currently this test gets ran all the time but I think it would make sense to attach this to tests which download the full data every night. (not sure how to do this).

(also, the test highlighted an error in Italy, cases were being used for deaths and tests, I have corrected this here.)

github-actions[bot] commented 3 years ago

👋 Thanks for opening this pull request! Can you please run through the following checklist before requesting review (ticking as complete or if not relevant).

Thank you again for the contribution. If making large scale changes consider using our pre-commit hooks (see the contributing guide) to more easily comply with our guidelines.

seabbs commented 3 years ago

FYI when doing this kind of branching approach to PRs you need to branch the features from the branch you are using to collect them. So the update tree is

Rather than pulling each out from master.

joseph-palmer commented 3 years ago

Makes sense, putting a threshold of 100 fixes the Brazil problem with the full data, but still get erros using the snapshot, oddly with a threshold of 1000 other countries start erroring....

seabbs commented 3 years ago

Hmm that is strange indeed.

RichardMN commented 3 years ago

When I initially thought of testing this less than or equal to test I was thinking at a national level. (Edit: I look back at what I wrote and I did say regionally.) There will be countries with regions which might have no ICU capacity but where the ability to transfer patients exists, which could lead to deaths being counted in one jurisdiction separate from where the case is identified or confirmed. I’d suggest limiting to level 1 jurisdictions but I could imagine medevacs from Nunavut to Ontario or from NWT to Alberta in Canada; I don’t know they’ve happened but there are countries with wildly varying level 1 regions by population or medical facilities.

I was also thinking that they should be run as much as possible on full and current datasets since they are to some extent a canary which could indicate a new problem in data download, clean or processing as opposed to an introduced problem in the code. These are tests for the combined system of the code and the data sources, not just a test of the code with a fixed (and presumably sane) data source.

I don’t know how to set them to run nightly either, unless we look at pushing them into the regional data tests used for the badges. Or preparing a separate track of tests which are called from the same GitHub workflows.

github-actions[bot] commented 3 years ago

This PR has been flagged as stale due to lack of activity