Cases greater than deaths

joseph-palmer commented 3 years ago

This PR adds tests that cases are greater than deaths for each region.

When using the stored data in custom_data, the tests fail for:

Brazil at levels 1 and 2.
France at level 2.
UK at level 2.

Everything else passes, looks like slicing the data causes issues with these countries.

On the full downloaded data (not just the first few rows) all pass except for Brazil level 2. The problem here is cedro do Abaete which has 12 cases and 47 deaths. Is this possible or a flaw in the data? If Brazil is a known problem we can write an exception for this and inform users somehow.

Currently this test gets ran all the time but I think it would make sense to attach this to tests which download the full data every night. (not sure how to do this).

(also, the test highlighted an error in Italy, cases were being used for deaths and tests, I have corrected this here.)

github-actions[bot] commented 3 years ago

👋 Thanks for opening this pull request! Can you please run through the following checklist before requesting review (ticking as complete or if not relevant).

[ ] Read our contribution guidelines if you have not already done so.
- [ ] If you have altered an existing class please run the tests locally (using devtools::load_all(); devtools::test()) first setting options(testDownload=TRUE, testSource=class-name) and report your findings.
- [ ] If you have added a new data class please run the tests locally for that class (using devtools::load_all(); devtools::test()).
- [ ] Check your code passes our CI checks and review any style and code coverage warnings.
- [ ] Comment with details if unable to get our CI checks to pass or unable to remove all warnings.

Thank you again for the contribution. If making large scale changes consider using our pre-commit hooks (see the contributing guide) to more easily comply with our guidelines.

seabbs commented 3 years ago

FYI when doing this kind of branching approach to PRs you need to branch the features from the branch you are using to collect them. So the update tree is

merge master into main feature branch
merge main feature branch into individual feature branch

Rather than pulling each out from master.

joseph-palmer commented 3 years ago

Makes sense, putting a threshold of 100 fixes the Brazil problem with the full data, but still get erros using the snapshot, oddly with a threshold of 1000 other countries start erroring....

seabbs commented 3 years ago

Hmm that is strange indeed.

RichardMN commented 3 years ago

When I initially thought of testing this less than or equal to test I was thinking at a national level. (Edit: I look back at what I wrote and I did say regionally.) There will be countries with regions which might have no ICU capacity but where the ability to transfer patients exists, which could lead to deaths being counted in one jurisdiction separate from where the case is identified or confirmed. I’d suggest limiting to level 1 jurisdictions but I could imagine medevacs from Nunavut to Ontario or from NWT to Alberta in Canada; I don’t know they’ve happened but there are countries with wildly varying level 1 regions by population or medical facilities.

I was also thinking that they should be run as much as possible on full and current datasets since they are to some extent a canary which could indicate a new problem in data download, clean or processing as opposed to an introduced problem in the code. These are tests for the combined system of the code and the data sources, not just a test of the code with a fixed (and presumably sane) data source.

I don’t know how to set them to run nightly either, unless we look at pushing them into the regional data tests used for the badges. Or preparing a separate track of tests which are called from the same GitHub workflows.

github-actions[bot] commented 3 years ago

This PR has been flagged as stale due to lack of activity

epiforecasts / covidregionaldata

Cases greater than deaths #309