czbiohub-sf / shrimPy

shrimPy: Smart High-throughput Robust Imaging & Measurement in Python
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

GPU deskew #146

Closed talonchandler closed 2 weeks ago

talonchandler commented 3 weeks ago

This PR changes the existing scipy.ndimage deskew to a GPU-accelerated monai deskew.

-- Benchmarks: Deskewing a 280x600x1372 argolight target takes:

(Deskewing function calls, including CPU->GPU->CPU transfers) (Most relevant to @ieivanov's live applications) Before: 61 s After: 5.1 s

(Deskewing CLI calls, including IO, imports, and other overhead) Before: 82 s After: 49 s

-- The new and old deskews do not match exactly for two reasons:

(1) MONAI's GPU deskew only supports nearest-neighbor and bilinear interpolation modes, not the linear spline interpolation that we used previously. I chose to use bilinear interpolation, and on close inspection (screenshots don't show any difference) the only differences I observe are:

@ieivanov I would appreciate your help in scrutinizing a few initial deskews as we onboard this change.

(2) MONAI's GPU deskew only supports filling empty values with zeros, not the background-estimated fill that we used previously. This difference is not important because we almost always clip off the overhang, leaving no values to fill.

You can take a closer look at my argolight tests here: /hpc/projects/comp.micro/mantis/2024_04_23_mantis_alignment/2-deskew/test-monai

--

Other notes:

ieivanov commented 2 weeks ago

I think this is ready, @talonchandler @edyoshikun could you please take another look?

talonchandler commented 2 weeks ago

LGTM! Thanks for wrangling tests, @ieivanov.

talonchandler commented 2 weeks ago

The opencell dataset for 1 timepoint 1 channel it took about ~ 3min to apply the deskew. Was that the same experience with you? Somehow I recall this being faster. Was that the same experience you guys had?

Thanks for testing @edyoshikun. I only ran before-and-after benchmarks on the Argolight target.

How large was the opencell volume you tested? Is the ~3 minutes consistent with 49 seconds I clocked for my 280x600x1372 volume.

ieivanov commented 2 weeks ago

We just discussed that the CLI call executes on CPU, as before. deskew_data has an option to use GPUs but that's not implemented in the CLI and that's OK because we speed things up with slurm and CPU multiprocessing. The option to deskew data on GPU is most useful for live data reconstruction / visualization, which currently happens through scripts.