Closed janinemelsen closed 8 months ago
Hi @janinemelsen! Thanks for bringing this up. We are aware that this is a pain point in our pipeline - one major reason is that we use feather files (a compressed file format), which allows us to save space on the machine but is unfortunately slow to read/write. We are currently exploring options of moving away from feather files, but there is no concrete timeline for when that will be incorporated.
We recently did incorporate a change that should speed up cluster_pixels
: https://github.com/angelolab/ark-analysis/pull/1069. I would also suggest trying to parallelize it using the multiprocess
parameter. Hope this helps.
Maybe a stupid question but... if train the pixel som on 50% of the dataset instead of 10%...mapping pixels goes pretty fast. I am not doing something illegally here right:P? I didnt expect this, since I thought that training and mapping was done to prevent assigning clusters on all the pixels all at once, and should save time.
I will also try your suggestions, thanks!
Hmm that's pretty strange. The cluster_pixels
function should be independent of the subsetting percentage (the cluster_pixels
function reads in the trained weights and assigns each fov using those weights). I wonder if you pulled the new repo in between doing the 50% and 10% tests? We incorporated the change I mentioned above relatively recently that makes cluster_pixels
much faster. Otherwise, I'm not sure why there's the speed difference.
Yes you are right, its the update that makes the huge difference! Now it works smoothly:)
Hi! I came across your analysis pipeline and would love to try the pixel clustering, in order to prevent the issue with the spatial signal overlap. I tested the pipeline using the jupyter notebooks and the example dataset and everything runs smoothly,
My images are pretty big (6134x9816 pixels) and now the analysis is after 24h still busy with assigning the pixel clusters ( Pixel_som_clustering.cluster_pixels function). Is there a way to speed up this process?