Closed hannahker closed 5 months ago
This setup is designed to make sure that 1) the generate_signals() scripts continue to run successfully for all indicators, and 2) the output of this function is consistent for all indicators. My understanding is that generate_signals() is a function that will pretty comprehensively run through a lot of the overall codebase, so continuously testing it for these various input indicators will be quite useful.
Makes sense!
I've created new test scripts that closely mirror our existing update scripts. I prefer this so that we can keep the testing code isolated as much as possible. I think it also makes sense not to bundle this in with the existing test or GMAS_TEST_RUN variables since it'll be cleaner to keep those focused on enabling development (and not testing as well).
Makes sense to have separate scripts, to ensure that we can use the test data, and makes sense to have test scripts. A concern I have is say we make changes to an indicator run, such as adding in a second plot to the email. That function would need to be imported and passed to generate_signals()
. Is there any way to ensure a match between update_{indicator}.R
and test_{indicator}.R
? Maybe we could enforce that if update_{indicator}.R
is changed that the matching test_{indicator}.R
file is changed? This is done often in R packages that won't let you commit changes to an .Rmd
file if there is no change to the output .md
file. It would be even better if we could enforce that just if there was changes to the generate_signals()
call, but not sure that's possible?
Agree about keeping them separate! Maybe that helps us define what we call them? Because test
is not about testing, but about doing a dry run to view the visual outputs? Could we call it HS_DRY_RUN
? Does that make sense? And HS_LOCAL
instead of GMAS_TEST_RUN
?
I'm imagining we run these scripts with test = FALSE and GMAS_TEST_RUN = TRUE -- so we mimic the "production" run of the code without connecting to external systems.
Agreed! However, I can see an issue with the idea to keep GMAS_TEST_RUN
(will call it HS_LOCAL
below). For the IPC food insecurity runs, maps are loaded in directly from the IPC API, because you can pull in geojson files for specific analyses. So if we wanted to truly make this local, would need to mock those files as well as the raw_{indicator}.R
inputs.
And if HS_LOCAL
is also used for development, in those cases, we WOULD want to call the IPC API to download the map, and run the raw_{indicator}.R
file as normal against the API. So maybe HS_LOCAL
sticks to ensuring the outputs are kept local. And mocking of the inputs happens for the test_{indicator}.R
portions?
I started looking into testthat, which looks like a good package to give us an over all unit testing framework. I haven't pursued it here further just because it'll take me longer to figure out and I want to prioritize getting some very basic tests in place first.
Yep it's really good! I think we can build on top of what you put in place to add it in later, as we can add more unit testing on other functions in utils
and elsewhere. But don't think it's critical for what you're doing.
The key piece of work here will be creating input dummy raw datasets for each indicator. I think these should be relatively small, but should cover as many edge cases as we can think of. I'm undecided on the best way to store this data. Should it be hard-coded dataframes in R? parquet files in the codebase? dummy files in Azure? I would lean away from external storage (like Azure) so that we're not potentially messing up our production data, and also because we don't really want to have potential cloud connection issues interfere with our testing.
Okay, I think good idea not to store up on Azure. My initial idea was to have scripts to generate test data by filtering real data to include all of the edge cases we want. This means we could regenerate easily if changes are done in the source data. It seems fairly common from what I've seen elsewhere to hardcode in scripts as you mentioned. However, some of the raw data frames are quite large in terms of columns, and might be a pain to maintain even if the number of rows are limited.
Actually, thinking about it, I think would be extremely difficult because to test the flagging methods for IDMC displacement and ACLED conflict, we need over 3 years of events data for a country, which will be a significant amount of data to hardcode. I think the filtration option would be the only possibility in this case I think? But then how do we match the final output with that initial filtered data? Do we generate it once, double check it, and then hardcode the final output?
Closing in favour of #172
Closes #110 and #94.
For far, working to add basic tests for all indicators. These tests should run without external API calls (eg. to IPC or Mailchimp APIs). However, we'll still need connections to the Prod Azure Blob to read in necessary static files for indicators (eg. geojsons for mapping).
This PR also includes changes to move globally defined datafames into functions, so that they won't be called unnecessarily by imports.
The tests for all indicators should be extended beyond the base cases added here. This can be done by running the
generate_signals_wrapper()
functions against different input raw datasets, which are generated manually and stored intest/data/*.RDS
.