angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
70 stars 25 forks source link

Speed up pixel clustering for large images #1081

Closed janinemelsen closed 8 months ago

janinemelsen commented 10 months ago

Hi! I came across your analysis pipeline and would love to try the pixel clustering, in order to prevent the issue with the spatial signal overlap. I tested the pipeline using the jupyter notebooks and the example dataset and everything runs smoothly,

My images are pretty big (6134x9816 pixels) and now the analysis is after 24h still busy with assigning the pixel clusters ( Pixel_som_clustering.cluster_pixels function). Is there a way to speed up this process?

cliu72 commented 9 months ago

Hi @janinemelsen! Thanks for bringing this up. We are aware that this is a pain point in our pipeline - one major reason is that we use feather files (a compressed file format), which allows us to save space on the machine but is unfortunately slow to read/write. We are currently exploring options of moving away from feather files, but there is no concrete timeline for when that will be incorporated.

We recently did incorporate a change that should speed up cluster_pixels: https://github.com/angelolab/ark-analysis/pull/1069. I would also suggest trying to parallelize it using the multiprocess parameter. Hope this helps.

janinemelsen commented 9 months ago

Maybe a stupid question but... if train the pixel som on 50% of the dataset instead of 10%...mapping pixels goes pretty fast. I am not doing something illegally here right:P? I didnt expect this, since I thought that training and mapping was done to prevent assigning clusters on all the pixels all at once, and should save time.

I will also try your suggestions, thanks!

cliu72 commented 9 months ago

Hmm that's pretty strange. The cluster_pixels function should be independent of the subsetting percentage (the cluster_pixels function reads in the trained weights and assigns each fov using those weights). I wonder if you pulled the new repo in between doing the 50% and 10% tests? We incorporated the change I mentioned above relatively recently that makes cluster_pixels much faster. Otherwise, I'm not sure why there's the speed difference.

janinemelsen commented 8 months ago

Yes you are right, its the update that makes the huge difference! Now it works smoothly:)