Open caldwellst opened 1 month ago
This has been addressed in #217 with read_az_file_cached()
. However, needs to be combed through to make sure that the read_az_file_cached()
and az_file_detect_cached()
is used where we think it should be. For instance, use it when we expect the calls not to change, such as if we are reading a file in input/...
. If we are reading files from output/...
, we shouldn't use the cached version because we expect the files to change.
The functions in audience.R
listed above should also be cached. You'll need to go through and make sure this has been done.
Previously, some calls to external APIs were performed at the time a module was loaded. This ensured that subsequent loading of the module (or other data) would not continually call the same API endpoint. For instance, the data on the cloud at
input/indicator_mapping.parquet
would be loaded directly in thecustom_segmentation.R
module, and then used within each function, ensuring it was only called once as we expect it to be relatively static.Every time we used
a()
, we wouldn't have to reload the data from the cloud. However, the issue is, these calls to the cloud are made every time a module is loaded. In #247, we are implementing comprehensive testing of the entire project, which means that modules are being loaded for testing. We don't want to make any external API calls when testing. This means we need to move all calls to external APIs inside functions, which can then be mocked/stubbed so the calls are never actually made (we can't do that until we load the module).Once #247 is merged, we should add caching functionality to a set of functions that were previously loaded at the module level. I have used the
{memoise}
package in the past, which makes it easy to simply set up caches for functions based on input parameters. Should be easy enough to implement.Here is a list of the modules and functions that should be cached:
audience$mc_groups()
audience$mc_members()
cloud_storage$read_az_file("input/indicator_mapping.parquet")
: how do we memoise this call, but NOT when we read other data from the Azure store? We don't necessarily want all of these cached, only some of the static calls to theinput
directory. Is there a way to only cache if the call is made with a string that indicates the file is stored ininput
and not inoutput
? Maybe for thecloud_storage
module, we for instance create a function that uses the cacheread_az_file_cached()
that is only called when appropriate? Otherwise we useread_az_file()
.