Is your feature request related to a problem? Please describe.
We have generated the science to understand how and what precomputed random_activity_lists should look like. This ticket relates to the design considerations we discussed to enable users full control over using precomputed and making new precomputed on any network configuration of their choice.
Describe the solution you'd like
We would like activity calculations to have a default use case that matches best practices, but can be overwritten by configuration and by run time flags. We would like to minimize data sizes that are stored and maximize speed as the default.
[ ] Move the pruner.save_run_information into pruner.save_networks -- anytime someone saves pruned networks, the run_information is saved with it.
[ ] Make save_networks create a unique hash that can be used to identify the networks that resulted from pruning. Add that key to the runinformation.text
[ ] go backwards and add hash to our default networks already in use.
[ ] Add global configuration flags for USE_PREGEN_DATA=TRUE, SAVE_NEW_PRECOMPUTE=TRUE
[ ] Add reconfiguration for these flags by the user
[ ] Add new variables (with these globals) as the default assignment to enrichment_analysis. Pass these as necessary to subsequent underlying code.
[ ] Add to the tutorial information about reconfiguration of these globals and changing this on the fly at run time.
[ ] Create the ability to enable multiple data environment paths by which precomputed data can be searched against to find the matching datasets.
[ ] As part of multiple data environment paths, set a default configuration that user sets at beginning. New paths can be added, but something should be default.
[ ] in de novo random activities (i.e. random activity generation code) - handle where and how to save. If saving, save to the environment path that matches best. If no environment path matches since no hash of that network existed in any path, then use the default data_path configuration.
[ ] Update the tutorial to show how to add new data environment paths and set one as the default.
[ ] in random activities when saving new precomputed and its the first time for a hash, copy the runinformation.text to the root hash directory.
[ ] update where use pregenerated data is true to search all data environment directories for the hash - if not hash then see above, if hash, then move on to that as the basis for looking for remaining best match.
[ ] save all precomputed on default networks that have been done to Figshare, and update configuration to grab these, like the network pickles are done.
Is your feature request related to a problem? Please describe. We have generated the science to understand how and what precomputed random_activity_lists should look like. This ticket relates to the design considerations we discussed to enable users full control over using precomputed and making new precomputed on any network configuration of their choice.
Describe the solution you'd like We would like activity calculations to have a default use case that matches best practices, but can be overwritten by configuration and by run time flags. We would like to minimize data sizes that are stored and maximize speed as the default.