implement an aws s3 sync for the processed data bucket at the end of extract and extract-holdout workflow steps, and at the beginning of train and test workflow steps
use fetch_file() MORF function to get extracted data and labels (instead of pulling from s3) so that these can be pulled from local copies instead.
resolved with series of commits from today; put a fetch_Train_test() function to wrap use of download_train_test() (which actually shouldn't get used unless MORF admin has deliberately disables caching)
this would look like:
aws s3 sync
for the processed data bucket at the end ofextract
andextract-holdout
workflow steps, and at the beginning oftrain
andtest
workflow stepsfetch_file()
MORF function to get extracted data and labels (instead of pulling from s3) so that these can be pulled from local copies instead.