broadinstitute / pooled-cell-painting-profiling-template

:hammer_and_wrench: Use me to version control Pooled Cell Painting data and processing pipelines
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

process_configuration is very slow #18

Closed gwaybio closed 3 years ago

gwaybio commented 3 years ago

Each time we run a step in the pipeline, we execute process_configuration(). This is very slow to run! (~5 min).

It seems like it should not take so long... this hurts us during troubleshooting steps, and in general, increases run time. We can make this faster for most use cases.

I believe the key step that is slowing us down is the need to specify all site files

https://github.com/broadinstitute/pooled-cell-painting-profiling-template/blob/95edc2ccdf818ebde6e2c418ec7b47835f08e4cd/config/utils/config_utils.py#L144-L146

We should consider only evoking this step if necessary. It should speed things up.