Closed joseph-palmer closed 3 years ago
👋 Thanks for opening this pull request! Can you please run through the following checklist before requesting review (ticking as complete or if not relevant).
devtools::load_all(); devtools::test()
) first setting options(testDownload=TRUE, testSource=class-name)
and report your findings. devtools::load_all(); devtools::test()
). Thank you again for the contribution. If making large scale changes consider using our pre-commit
hooks (see the contributing guide) to more easily comply with our guidelines.
FYI when doing this kind of branching approach to PRs you need to branch the features from the branch you are using to collect them. So the update tree is
Rather than pulling each out from master.
Makes sense, putting a threshold of 100 fixes the Brazil problem with the full data, but still get erros using the snapshot, oddly with a threshold of 1000 other countries start erroring....
Hmm that is strange indeed.
When I initially thought of testing this less than or equal to test I was thinking at a national level. (Edit: I look back at what I wrote and I did say regionally.) There will be countries with regions which might have no ICU capacity but where the ability to transfer patients exists, which could lead to deaths being counted in one jurisdiction separate from where the case is identified or confirmed. I’d suggest limiting to level 1 jurisdictions but I could imagine medevacs from Nunavut to Ontario or from NWT to Alberta in Canada; I don’t know they’ve happened but there are countries with wildly varying level 1 regions by population or medical facilities.
I was also thinking that they should be run as much as possible on full and current datasets since they are to some extent a canary which could indicate a new problem in data download, clean or processing as opposed to an introduced problem in the code. These are tests for the combined system of the code and the data sources, not just a test of the code with a fixed (and presumably sane) data source.
I don’t know how to set them to run nightly either, unless we look at pushing them into the regional data tests used for the badges. Or preparing a separate track of tests which are called from the same GitHub workflows.
This PR has been flagged as stale due to lack of activity
This PR adds tests that cases are greater than deaths for each region.
When using the stored data in custom_data, the tests fail for:
Everything else passes, looks like slicing the data causes issues with these countries.
On the full downloaded data (not just the first few rows) all pass except for Brazil level 2. The problem here is
cedro do Abaete
which has 12 cases and 47 deaths. Is this possible or a flaw in the data? If Brazil is a known problem we can write an exception for this and inform users somehow.Currently this test gets ran all the time but I think it would make sense to attach this to tests which download the full data every night. (not sure how to do this).
(also, the test highlighted an error in Italy, cases were being used for deaths and tests, I have corrected this here.)