OCHA-DAP / hdx-signals

HDX Signals
https://un-ocha-centre-for-humanitarian.gitbook.io/hdx-signals/
GNU General Public License v3.0
6 stars 0 forks source link

Add caching functionality for certain calls #248

Open caldwellst opened 1 month ago

caldwellst commented 1 month ago

Previously, some calls to external APIs were performed at the time a module was loaded. This ensured that subsequent loading of the module (or other data) would not continually call the same API endpoint. For instance, the data on the cloud at input/indicator_mapping.parquet would be loaded directly in the custom_segmentation.R module, and then used within each function, ensuring it was only called once as we expect it to be relatively static.

df_ind <- cs$read_az_file("input/indicator_mapping.parquet")

a <- function() {
  # do stuff with df_ind
}

Every time we used a(), we wouldn't have to reload the data from the cloud. However, the issue is, these calls to the cloud are made every time a module is loaded. In #247, we are implementing comprehensive testing of the entire project, which means that modules are being loaded for testing. We don't want to make any external API calls when testing. This means we need to move all calls to external APIs inside functions, which can then be mocked/stubbed so the calls are never actually made (we can't do that until we load the module).

Once #247 is merged, we should add caching functionality to a set of functions that were previously loaded at the module level. I have used the {memoise} package in the past, which makes it easy to simply set up caches for functions based on input parameters. Should be easy enough to implement.

Here is a list of the modules and functions that should be cached:

caldwellst commented 2 weeks ago

This has been addressed in #217 with read_az_file_cached(). However, needs to be combed through to make sure that the read_az_file_cached() and az_file_detect_cached() is used where we think it should be. For instance, use it when we expect the calls not to change, such as if we are reading a file in input/.... If we are reading files from output/..., we shouldn't use the cached version because we expect the files to change.

The functions in audience.R listed above should also be cached. You'll need to go through and make sure this has been done.