angelolab / mibi-bin-tools

Tools for extracting TIFF images from mibiscope bin files
MIT License
0 stars 3 forks source link

Parallelize histogram and MPH calculations #63

Closed alex-l-kong closed 1 year ago

alex-l-kong commented 1 year ago

What is the purpose of this PR?

Closes #58. Closes #62. Optimizes extraction of histograms needed for median pulse height calculations by doing it in parallel.

How did you implement your changes

Modifies:

The functions in bin_files.py now take a list of channels (or None to extract data for all channels). This then propagates over to c_extract_histograms so the underlying loops only need to run once (previously, they ran multiple times for each channel, needlessly increasing the complexity by a factor of num_channels_to_extract).

Remaining issues

This will need a new release, but we'll need to keep toffy pinned at mibi-bin-tools==0.2.10 for the time-being, since this introduces breaking changes. A separate PR will update toffy to be compatible with the new median pulse height calculation process (and subsequence 0.2.11 release).

ngreenwald commented 1 year ago

Maybe first leave the old, incorrect logic for corner cases intact, and make sure the parallelization doesn't change anything. Once you've confirmed that, see how big of a difference switching the logic makes across some real data

alex-l-kong commented 1 year ago

@ngreenwald already been testing on some SPAIN cohort FOVs, confirmed that the outputs (widths, intensities, pulse_counts, and median pulse heights) match up for all of them. Optimization provides a 5-7x speed increase.

ngreenwald commented 1 year ago

Okay got it. And how significantly do the outputs change with the fixed logic?

alex-l-kong commented 1 year ago

Okay got it. And how significantly do the outputs change with the fixed logic?

With the updated _minimum_larger_value_in_sorted, only around 0.05% of pixels are affected (makes sense, since #62 would only apply to a small subset of pixels). However, of those affected, their intensities change dramatically, up to a thousand or even 5000 or higher in some cases.

ngreenwald commented 1 year ago

Okay, so the overall MPH of a given image doesn't change appreciably?

alex-l-kong commented 1 year ago

Okay, so the overall MPH of a given image doesn't change appreciably?

Barely any changes at all, single digit differences at most.