BodenmillerGroup / IMCDataAnalysis

R based workflow for multiplexed imaging data
https://bodenmillergroup.github.io/IMCDataAnalysis/
MIT License
29 stars 12 forks source link

Using plotCells in SPE object created from anndata object (downstream analysis using python packages) #44

Closed joaolsf closed 1 year ago

joaolsf commented 1 year ago

Hello @nilseling ,

I am using imcRtools and cytomapper after running the whole single-cell and spatial IMC analysis using Python packages, such as Astir, Scanpy and Squidpy. I converted the resulted anndata object to a SPE object and most of the functions in both R packages run smoothly. However, because I subset the anndata object to exclude all cells labelled as “undefined” when I did run the cell assignment step with Astir, I think this is causing an issue when I try to run the function plotCells with cytomapper.

I get the following error:

“Error in if (!all(colData(object)[, cell_id] == floor(colData(object)[, : missing value where TRUE/FALSE needed”

I assume this is because I excluded the "undefined" cells, their integer cell IDs are not present in the “ObjectNumber” slot, thus there is mismatch between the current integer cell IDs and the integer pixel values in the segmentation mask object. Do you think that would be a reason for this error? It is easy to go back and “add” the previously excluded cells back to the anndata object and export again to create the SPE object, but just wanted to know first if that would be the cause of the error.

nilseling commented 1 year ago

Hi @joaoufrj

it shouldn't be an issue that these cell IDs are missing. For visualisation cells are matched via their integer IDs in both the SPE object and the CytoImageList object. Could you please post the function call that you are using here and please also the output of sessionInfo()?

joaolsf commented 1 year ago

Hi @nilseling ,

I identified the issue and could fix it. It was something small related to changing the object number when converting the anndata to SPE. I can use the plotCells function now without error. However, when I run the function call below, something interesting happens:

plotCells(cur_masks, object = spe, cell_id = "ObjectNumber", img_id = "sample_id", colour_by = "cell_type", colour = list(cell_type = metadata(spe)$color_vectors$cell_type))

It plots the cell masks, it plots the legend with the cell types and their respective colours in the legend but it DOES NOT color the cells in the masks. The function also does not output any error. I tried using the same colour codes as in your tutorial, and it does not fix the error. I can plot the protein markers expression though, so I wonder if there's an issue with linking the cell IDs with cell types and the colour codes in the color_vectors in the metadata?

nilseling commented 1 year ago

Good to hear that the initial issue was fixed. The plotCells function matches cells via their image ID (img_id) and cell ID (cell_id). Could you please make sure that some cells of the masks that you want to visualize are included in your SPE object. Cells that are not included in the object are colored in grey (or the color defined by missing_colour).

joaolsf commented 1 year ago

I did some changes in selecting the masks to plot and the function now colours some cells but not as much as I expected or as much when plotting similar graphs using other packages. It might be something related to an unmatched integer ID between the cell_ID in the SPE object and the integer ID in the mask CytoImageList object. Btw, I wonder if would be useful to see the integer ID of each cell in the mask CytoImageList object to make sure they match the integer ID in the SPE object.

nilseling commented 1 year ago

To get the integer IDs from the masks you can call unique(CIL$mask_of_interest). If you want you can also send me the SPE object and the mask object and I can have a look. Otherwise can you post the the output of unique(spe[,spe$img_id == "mask_of_interest"]$cell_id) and unique(CIL$mask_of_interest) here?

joaolsf commented 1 year ago

Yes, I tried that but can't understand what info is the integer ID from the mask. How can I send you the SPE and the mask object?

nilseling commented 1 year ago

Ah ok, I see. So each mask is an x/y matrix. Each entry in the matrix is a pixel. The value of each pixel is either 0 (background) or an integer (1, 2, 3, etc). A set of pixels with the same integer ID represent a segmented cell. These integer IDs need to match the integer IDs stored in spe$cell_id or however you named this entry. You can either drop the .rds files here (if they are small) or send me a link via email (nils.eling@uzh.ch) from where I can download it.

joaolsf commented 1 year ago

I see! Ok I will send you the link for the rds files over email. Think will be easier that way.

nilseling commented 1 year ago

Thanks for sharing the objects. Looking at colData(spe)$ObjectNumber I see that you have continuously indexed your cells across the whole dataset. So your largest cell ID is 507515. You will need cell IDs that are matched to each segmentation mask individually. For example: your first segmentation mask contains cells 1 to 4567 and your second mask contains cells 1 to 5764. So currently I can't tell how to match cells from the SPE object to the masks.

I also noticed another issue: max(unique(as.vector(masks$Patient1_001))) return 65535. This is the maximum number that can be encoded by 16-bit encoding. How did you perform the segmentation?

joaolsf commented 1 year ago

Hi Nils,

Hmm ok, I fixed the cell IDs per each file matching the segmentation mask and still didn’t work. I think the issue is because I excluded the “undefined” cells when I was running the analysis in Python and the exported anndata object used to generate the SPE changed the original cell IDs (the ones generated by the segmentation). For segmentation I used DeepCell/ark-analysis pipeline.

Best wishes,

Joao Luiz, PhD MRC Research Co-Investigator Wellcome Centre for Integrative Parasitology School of Infection & Immunity, College of Medical Veterinary & Life Sciences, Sir Graham Davies Building, University of Glasgow, 120 University Place, Glasgow, G12 8TA, Scotland, UK

@thesilvafilho On 14 Dec 2022 at 3:36 PM +0000, Nils Eling @.***>, wrote:

Thanks for sharing the objects. Looking at colData(spe)$ObjectNumber I see that you have continuously indexed your cells across the whole dataset. So your largest cell ID is 507515. You will need cell IDs that are matched to each segmentation mask individually. For example: your first segmentation mask contains cells 1 to 4567 and your second mask contains cells 1 to 5764. So currently I can't tell how to match cells from the SPE object to the masks. I also noticed another issue: max(unique(as.vector(masks$Patient1_001))) return 65535. This is the maximum number that can be encoded by 16-bit encoding. How did you perform the segmentation? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

nilseling commented 1 year ago

I'm not sure if this happens during object conversion. Assuming you used zellkonverter it does not alter the cell metadata of the object. I would recommend to follow the steinbock framework for segmentation. It offers DeepCell segmentation using the pre-trainer Messmer model. You can use steinbock to export your data in anndata format to preform data analysis in python. Once you are done with this you can always convert back to R and use cytomapper.

joaolsf commented 1 year ago

I am trying different ways to integrate the python and R pipelines, and the zelkonverter only converts anndata to SCE object. With that I could run everything in imcRtools fine but didn’t try cell visualisation with cytomapper. In parallel, because I wanted to run the analysis in R using a SPE object, I exported the different slots of the anndata for each sample to csv files and edited them to look like the steinbok outputs. Then I used the read_steinbock function to generate the SPE. I think my mistake was to do this with the subsetted anndata, where I excluded the “undefined” cells. I assume that if I use the original anndata object, with all cells and their IDs as generated by the segmentation, the plotCells function will work. My next step is to try the steinbock framework for preprocessing and segmentation. On 15 Dec 2022 at 11:15 AM +0000, Nils Eling @.***>, wrote:

I'm not sure if this happens during object conversion. Assuming you used zellkonverter it does not alter the cell metadata of the object. I would recommend to follow the steinbock framework for segmentation. It offers DeepCell segmentation using the pre-trainer Messmer model. You can use steinbock to export your data in anndata format to preform data analysis in python. Once you are done with this you can always convert back to R and use cytomapper. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

nilseling commented 1 year ago

Hi @joaoufrj

imcRtools and cytomapper both work with SingleCellExperiment objects. I would not recommend to write out files in a way that are produced by steinbock. I will close this issue for now. Please open a new issue if you encounter new problems.