STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
187 stars 64 forks source link

General question about SingleR in stereopy #140

Closed limin321 closed 1 year ago

limin321 commented 1 year ago

I try to understand how SingleR works in stereopy, then I got lost. The two examples one use Mouse_brain_ref.anndata075.h5ad to annotate ./SS200000135TL_D1.cellbin.gef; the other is using ./MouseRNAseqData.h5ad to annotate './SS200000135TL_D1.tissue.gef'. My questions are:

  1. As a customer with my own data, how do I determine which one should I annotate? the cellbin.gef or tissue.gef? how to prepare the two references files Mouse_brain_ref.anndata075.h5ad and MouseRNAseqData.h5ad? what are the difference between them?
  2. What is the difference between this SingleR of stereopy and the SingleR package(https://bioconductor.org/packages/devel/bioc/vignettes/SingleR/inst/doc/SingleR.html)

Thanks for the help.

Best, LC

limin321 commented 1 year ago

I follow the tutorial and run into the following error. Here are the codes I followed:

import stereo as st
from stereo.core.stereo_exp_data import AnnBasedStereoExpData
import warnings
warnings.filterwarnings('ignore')
test_file = './stereopy_demo/data/SS200000135TL_D1.cellbin.gef'
ref_file = './stereopy_demo/data/Mouse_brain_ref.anndata075.h5ad'

data = st.io.read_gef(test_file, bin_type = "cell_bins")
ref = AnnBasedStereoExpData(ref_file)
# preprocessing
ref.tl.log1p()
ref.tl.normalize_total()

data.tl.cal_qc()
data.tl.log1p()
data.tl.normalize_total()

# singleR
data.tl.single_r(
    ref_exp_data = ref,
    ref_use_col = 'ClusterName',
    res_key = 'annotation'
)

Here is the error message -- the error message is too long, I only pasted the beginning and end part:

Screen Shot 2023-06-30 at 2 13 31 PM Screen Shot 2023-06-30 at 2 12 45 PM

Any suggests on how to troubleshoot the error?

Best,

LC

UglyRay7 commented 1 year ago

Firstly, sorry for my late reply.

My colleague is working on your BUG feedback.

I will answer your questions about SingleR.

  1. cellbin.gef is generated from the combination of expression matrix (.gef) and biochemical image (mask.tif), processed with cell segmentation algorithm. It means cellbin.gef contains the cell information from the image, of which the basic unit is a cell. Similarly, tissue.gef is processed with tissue segmentation, of which the basic unit is a square bin (more in Quick Start tutorial).

Mouse_brain_ref.anndata075.h5ad and MouseRNAseqData.h5ad are both reference files for SingleR. MouseRNAseqData.h5ad is transformed from MouseRNAseqData.Rdata (provided in R package), a lightweight one. Compared with it, the first reference contains richer annotation information.

  1. SingleR function in Stereopy is reconstructed on the same basis with SingleR (R package), but using Python. You can learn about its algorithms through the paper.

Hope my answer will be helpful! Ray

Zhenbin24 commented 1 year ago

Firstly, sorry for my late reply.

In the SingleR algorithm, when traversing the exp_matrix data, the filtering of the expression column field is not added, resulting in the prompt that the column index does not exist in the subsequent execution.

A fix for this issue will be released in the next release.

limin321 commented 1 year ago

Firstly, sorry for my late reply.

My colleague is working on your BUG feedback.

I will answer your questions about SingleR.

  1. cellbin.gef is generated from the combination of expression matrix (.gef) and biochemical image (mask.tif), processed with cell segmentation algorithm. It means cellbin.gef contains the cell information from the image, of which the basic unit is a cell. Similarly, tissue.gef is processed with tissue segmentation, of which the basic unit is a square bin (more in Quick Start tutorial).

Mouse_brain_ref.anndata075.h5ad and MouseRNAseqData.h5ad are both reference files for SingleR. MouseRNAseqData.h5ad is transformed from MouseRNAseqData.Rdata (provided in R package), a lightweight one. Compared with it, the first reference contains richer annotation information.

  1. SingleR function in Stereopy is reconstructed on the same basis with SingleR (R package), but using Python. You can learn about its algorithms through the paper.

Hope my answer will be helpful! Ray

Hi Ray,

Thanks for the clarification. That helps a lot.

limin321 commented 1 year ago

Firstly, sorry for my late reply.

In the SingleR algorithm, when traversing the exp_matrix data, the filtering of the expression column field is not added, resulting in the prompt that the column index does not exist in the subsequent execution.

A fix for this issue will be released in the next release.

@Zhenbin24 ,

Thanks for the explanation and troubleshooting. I will wait for the next release and try later. The problem is solved I am going to close this topic.

Best, LC