Motivation-and-Behaviour / sleepIPD_analysis

Analysis for the sleep and physical activity pooled study (https://osf.io/gzj9w/)
Other
0 stars 0 forks source link

Pipeline changes #76

Closed tarensanders closed 1 year ago

tarensanders commented 1 year ago

Makes a few changes that should make the pipeline run a bit better on the HPC. See #72.

I also added GCP storage for data_imp. It should fit well within the free tier limit, but if it doesn't we can easily remove it.

To setup your own access, follow the setup steps in the targets manual. But, you'll need to configure the oAuth JSON to use the bucket I've shared with you. hopefully I've given you the right permissions.

Note that for this to work/actually save us time we need to push the _targets/meta/meta file to github so that we are sharing an object hash.

tarensanders commented 1 year ago

Before I merge this I have one quick (I hope) additional suggestion.

I was going to also put all the model_list_* on GCP, but a quick check shows that they are >500mb each (and even larger in memory)

r$> library(lobstr) 
library(butcher) 

    targets::tar_load(model_list_by_weekday) 

    obj_size(model_list_by_weekday) 
1.57 GB

Each of the models from the imps is ~100mb, so that's 300mb for 3 imps per exposure/outcome.

r$> obj_size(model_list_by_weekday$`scale_pa_intensity by scale_sleep_regularity_lag`$model$`3`)
99.99 MB

For whatever reason, this is all because of the size of the @call$formula environment.

r$> obj_size(model_list_by_weekday$`scale_pa_intensity by scale_sleep_regularity_lag`$model$`3`@call$formula)
98.76 MB

If you remove this, it comes down to like 3kb.

r$> model_list_by_weekday$`scale_pa_intensity by scale_sleep_regularity_lag`$model$`3`@call$formula <- butcher::axe_env(model_list_by_weekday$`scale_pa_intensity by scale_sleep_regularity_lag`$model$`3`@call$formula) 

    obj_size(model_list_by_weekday$`scale_pa_intensity by scale_sleep_regularity_lag`$model$`3`@call$formula)                                                                                                                    
3.78 kB

So, I'm going to remove this environment before the model is returned, and check if the pipeline still runs. I'll do that before I merge this.