angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
69 stars 25 forks source link

Pixel Subset in Pixie #1118

Closed matthew-lee1 closed 4 months ago

matthew-lee1 commented 4 months ago

Hello,

In your nature communications paper in the supplementary information, you show that 10% subset of pixels still works. I am curious if you tried other values as well such as 7.5%, 5%, etc. Asking because at 10% I run into memory errors (120k x 60k image), but I am able to run 5% subset successfully.

cliu72 commented 4 months ago

Hi @matthew-lee1! Yes, we have tested subsetting lower than 10%. For around a billion pixels, we have done down to 1% and shown that results are still good. So for your use case, 5% should be fine. That being said, this can be specific to each dataset (i.e. how many markers you have, and more importantly how many phenotypes you would "expect" and how well represented these phenotypes are in your image). If you have certain markers that are super rare and aren't sampled in the 5% subset, you could run into problems with capturing those phenotypes. However, in our experience, for most well-designed panels, a random 5% subset is fine. The best way to evaluate is to look at the resulting pixel clusters with your markers and confirming that they reflect the underlying expression well. Hope this helps!

matthew-lee1 commented 4 months ago

Thanks so much! Additionally, if all follow up analyses will be cell-based, is it ok to cluster only on pixels within cell segmentation masks?

cliu72 commented 4 months ago

Yup, if you only care about cells, it's ok to cluster only on pixels within segmentation masks.

matthew-lee1 commented 4 months ago

Perfect thanks! Actually I've implemented and think I might've found a bug. In practice what I did was for any pixel labelled 0 on the segmentation mask (any non-cell pixel), I set all of the channels expression to 0. I did this since the code already filters out for pixels that are all 0 and thought this would be the easiest. What I found was that for FOVs where there were no pixels being used (which could happen either because of no cells OR if all pixels sum to 0 in that FOV, so not just my particular use case), the .feather file written out would include a 0 for the pixel_som_cluster. Later on, this would introduce a new "cluster" of index 0 (a 101th cluster), which I'm sure you know causes problem downstream since everything should be 1 indexed. My solution was to include at least 1 pixel from every FOV even if they all sum to 0.

cliu72 commented 4 months ago

Thanks for catching that. It makes sense that it has never come up for us because all of our images are non-zero. Because we typically include all pixels (not just those within cells), we never have empty images. Your solution sounds like it works fine, but another solution is to just manually exclude all images that are 0. You can change the list of FOVs with the fovs parameter in the notebook (default is to include all fovs, but you can change it).