broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Aggregate fails when output_single_file_only option set to False #69

Open gwaybio opened 3 years ago

gwaybio commented 3 years ago

In an experiment with >1,000 sites, the aggregate recipe step fails quietly. We do not observe any errors, but the recipe next step is nevertheless performed and not surprisingly fails.

This may be a compute size issue, but it silently failing is still concerning and we should address.

One option is to aggregate each site independently, and then, with the number of single cells per perturbation, weight the aggregated contribution proportionally to cell count. ~I describe this option in #57 - time to revisit!~

gwaybio commented 3 years ago

One option is to aggregate each site independently, and then, with the number of single cells per perturbation, weight the aggregated contribution proportionally to cell count.

This is only an approximation to the aggregation and cannot be exactly equivalent to aggregating all single cells at once.

In #70 I implement an approx_aggregate_piecewise() function that does precisely the above. Currently, I have it set to only approx aggregate on sites, but we might explore aggregating based on wells as well.