STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
188 stars 64 forks source link

About cell annotation - reset .gene_names #323

Closed x11xx1 closed 6 days ago

x11xx1 commented 2 weeks ago

Hi, I‘m seeking for your kindly help. I tried to used SingleR in Stereopy to annotate the cell for saw8 output and my ref is BlueprintEncodeData.h5ad from SingleR. The code is as below, just like your tutorials.

test_file = './data/SN.tissue.gef'
ref_file = './data/BlueprintEncodeData.h5ad'

data = st.io.read_gef(test_file, bin_size=50)
ref = st.io.read_h5ad(ref_file)

# preprocessing
ref.tl.log1p()
ref.tl.normalize_total()

data.tl.cal_qc()
data.tl.log1p()
data.tl.normalize_total()

data.tl.single_r(
    ref_exp_data=ref,
    ref_use_col='ClusterName',
    res_key='annotation'
    #method='rapids'  #  Specifying the method as rapids means using gpu
)

It encountered an AssertionError: no gene of test_exp_data.gene_names in ref_exp_data.gene_names. Then I found that data.gene_names are ensembl IDs while ref.gene_names are HGNC symbols and tried to replace the gene_names.

ensembl_to_hgnc = pd.Series(mapping_df['Approved symbol'].values, index=mapping_df['Ensembl gene ID']).to_dict()
new_gene_names = [ensembl_to_hgnc.get(gene_id, gene_id) for gene_id in data.gene_names]
data.gene.gene_name = new_gene_names

However, the error message is can't set attribute data.gene_names = new_gene_names AttributeError: can't set attribute

Are there any ways to change the gene_names? Many thanks🙏

tanliwei-coder commented 2 weeks ago

@x11xx1

Does the data.genes have a column real_gene_name? If so, you can set parameter gene_name_index as True to set gene symbols as index instead when reading test_file.

x11xx1 commented 2 weeks ago

@x11xx1

Does the data.genes have a column real_gene_name? If so, you can set parameter gene_name_index as True to set gene symbols as index instead when reading test_file.

The real_gene_names is an HGNC symbol column. It works. THX!!!