use caching for processed data

educational-technology-collective / morf

The MOOC Replication Framework (MORF)

MIT License

16 stars 7 forks source link

use caching for processed data #60

Closed jpgard closed 6 years ago

jpgard commented 6 years ago

this would look like:

implement an aws s3 sync for the processed data bucket at the end of extract and extract-holdout workflow steps, and at the beginning of train and test workflow steps
use fetch_file() MORF function to get extracted data and labels (instead of pulling from s3) so that these can be pulled from local copies instead.

jpgard commented 6 years ago

resolved with series of commits from today; put a fetch_Train_test() function to wrap use of download_train_test() (which actually shouldn't get used unless MORF admin has deliberately disables caching)