Starlitnightly / omicverse

A python library for multi omics included bulk, single cell and spatial RNA-seq analysis.
https://starlitnightly.github.io/omicverse/
GNU General Public License v3.0
437 stars 48 forks source link

KeyError 'base' in useing the function named ov.single.cosg #110

Closed zhangdae closed 2 months ago

zhangdae commented 3 months ago

Im trying to find DEGs between the leiden clusters use ov.single.cosg Here is a simple example. adata

AnnData object with n_obs × n_vars = 117399 × 21054
    obs: 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'nUMIs', 'mito_perc', 'detected_genes', 'cell_complexity', 'n_genes', 'doublet_score', 'predicted_doublet', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'leiden', 'leiden_0.2', 'leiden_0.4', 'leiden_0.6', 'leiden_0.8', 'leiden_1.0', 'leiden_1.2', 'leiden_1.4'
    var: 'gene_ids', 'feature_types', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'percent_cells', 'robust', 'mean', 'var', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'hvg', 'layers_counts', 'leiden', 'log1p', 'neighbors', 'sample_colors', 'scaled|original|cum_sum_eigenvalues', 'scaled|original|pca_var_ratios', 'scrublet', 'umap', 'dendrogram_leiden_0.8', 'leiden_0.8_colors', 'cosg'
    obsm: 'X_harmony', 'X_pca', 'X_umap', 'scaled|original|X_pca'
    varm: 'scaled|original|pca_loadings'
    layers: 'counts', 'lognorm', 'scaled'
    obsp: 'connectivities', 'distances'

ov.single.cosg(adata,groupby='leiden_0.8',key_added='cosg')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[80], line 1
----> 1 ov.single.cosg(adata,groupby='leiden_0.8',key_added='cosg')

File [~/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py:489](http://biotrainee.vip:8052/home/data/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py#line=488), in cosg(adata, groupby, groups, mu, remove_lowly_expressed, expressed_pct, n_genes_user, key_added, calculate_logfoldchanges, use_raw, layer, reference, copy)
    485         raise ValueError(
    486             f'reference = {reference} needs to be one of groupby = {cats}.'
    487         )
    488     pts=False
--> 489     anndata_obj = _RankGenes(adata, groups_order2, groupby, reference, use_raw, layer, pts)
    490     anndata_obj._basic_stats()
    493 ### Refer to Scanpy
    494 # for correct getnnz calculation
    495 ### get non-zeros for columns

File [~/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py:207](http://biotrainee.vip:8052/home/data/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py#line=206), in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
    196 def __init__(
    197     self,
    198     adata,
   (...)
    204     comp_pts=False,
    205 ):
--> 207     if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
    208         self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
    209     else:

KeyError: 'base'

I have found some errors like this https://github.com/scverse/scanpy/issues/2239 One comment refers that he add adata.uns['log1p']["base"] = None,so I tryied that advice bdata=adata bdata.uns['log1p']["base"] = None ov.single.cosg(bdata,groupby='leiden_0.8',key_added='cosg')

It literally fixed the problem,but something still goes wrong

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[86], line 1
----> 1 ov.single.cosg(bdata,groupby='leiden_0.8',key_added='cosg')

File [~/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py:566](http://biotrainee.vip:8052/home/data/miniconda/envs/omicverse/lib/python3.10/site-packages/omicverse/single/_cosg.py#line=565), in cosg(adata, groupby, groups, mu, remove_lowly_expressed, expressed_pct, n_genes_user, key_added, calculate_logfoldchanges, use_raw, layer, reference, copy)
    562 for col in rank_stats.columns.levels[0]:
    563     adata.uns[key_added][col]=rank_stats[col].to_records(
    564 index=False, column_dtypes=dtypes[col]
    565 )
--> 566 adata.uns[key_added]['pvals']=adata.uns['rank_genes_groups']['pvals']
    567 adata.uns[key_added]['pvals_adj']=adata.uns['rank_genes_groups']['pvals_adj']
    571 print('**finished identifying marker genes by COSG**')

KeyError: 'rank_genes_groups'

In the line 566,it try to assign adata.uns['rank_genes_groups'] to adata.uns[key_added],but the structure fo anndata is like below the uns['cosg'] had added to anndata,but nothing names rank_genes_groups

AnnData object with n_obs × n_vars = 117399 × 21054
    obs: 'sample', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'nUMIs', 'mito_perc', 'detected_genes', 'cell_complexity', 'n_genes', 'doublet_score', 'predicted_doublet', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'leiden', 'leiden_0.2', 'leiden_0.4', 'leiden_0.6', 'leiden_0.8', 'leiden_1.0', 'leiden_1.2', 'leiden_1.4'
    var: 'gene_ids', 'feature_types', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'percent_cells', 'robust', 'mean', 'var', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'hvg', 'layers_counts', 'leiden', 'log1p', 'neighbors', 'sample_colors', 'scaled|original|cum_sum_eigenvalues', 'scaled|original|pca_var_ratios', 'scrublet', 'umap', 'dendrogram_leiden_0.8', 'leiden_0.8_colors', 'cosg'
    obsm: 'X_harmony', 'X_pca', 'X_umap', 'scaled|original|X_pca'
    varm: 'scaled|original|pca_loadings'
    layers: 'counts', 'lognorm', 'scaled'
    obsp: 'connectivities', 'distances'

And it looks like alright bdata.uns['cosg']

{'params': {'groupby': 'leiden_0.8',
  'reference': 'rest',
  'groups': 'all',
  'method': 'cosg',
  'use_raw': True,
  'layer': None},
 'names': rec.array([('Ccl11', 'Notum', 'Aqp4', 'Fst', 'Otx2', 'Slc47a1', 'Csf1r', 'Nrxn3', 'Rgs5', 'Ptprb', 'Sorcs3', 'Vcan', 'Epyc', 'Pou3f2', 'Ceacam16', 'Dytn', 'Mpz', '6330411D24Rik', 'Pbk', 'Ly6d', 'Zfp804a', 'AY036118', 'Fxyd2', 'G630064G18Rik', 'Grik1', 'Esrrb', 'Gcg', 'Prss3', 'Gm29266', 'Cxcr2', 'C1qb', 'Cobl', 'Gypa', 'Cd3g', 'Anks1b', 'Pkp1', 'Gp1bb', 'Vmo1', 'Scgb3a2', 'March4', 'Mog', 'Gm20619', 'Dct', 'Notumos', 'Actn2', 'Gm3776'),
            ('Osr2', 'Emilin2', 'Tdrd12', 'Inmt', '1810028F09Rik', 'Gm42397', 'Ifi207', 'Grik1', 'Aspn', 'Adgrl4', 'Glra3', 'Crabp1', 'Otos', 'Gpr37l1', 'Prss36', 'Chrna10', 'Art3', 'Rassf6', 'Aurkb', 'Cd79b', 'Cadps', 'Gm19951', 'Ptgds', 'Pax3', 'Gm37821', 'Kcne1', 'Gm26512', 'Fam25c', 'Meis1', 'Mmp9', 'C1qa', 'Prss36', 'Slc4a1', 'Trbc2', 'Epha3', 'Atp6v0a4', '9130015G15Rik', 'Rnase1', 'Adipoq', 'Ripor3', 'Nkx6-2', 'Satb2', 'Tyr', 'Rarres1', 'Slc6a7', 'Gsta1'),
            ('Gpr83', 'Tnfaip6', 'Kcnj15', 'Dpep1', 'Tm4sf5', 'Slc22a6', 'Ms4a6c', 'Unc13c', 'Slc38a11', 'Cldn5', 'Aass', 'Igf2bp1', 'Fbxo2', 'Dhh', 'Epyc', 'Loxhd1', 'Ncmap', 'Gm14066', 'Bub1', 'Cd79a', 'Ppp1r1c', 'Lars2', 'Apod', 'Tyr', 'Unc13c', 'Dnase1', '5430419D17Rik', 'Tdh', 'Dach2', 'Retnlg', 'C1qc', 'BC006965', 'Alas2', 'Itk', 'Grid2', 'Krt14', 'Iyd', 'Cndp2', 'Hbb-bs', '1700029N11Rik', 'Hapln2', 'Col22a1', 'Gpnmb', 'Cxcl13', 'Clca2', 'Gsta2'),
sc.pl.rank_genes_groups(bdata, key='cosg')

image

zhangdae commented 3 months ago

omicverse 1.6.3 ,anndata 0.10.8

Version

name: omicverse channels:

Doctorluka commented 3 months ago

This looks like a bug caused by scanpy. You can run the following code before ov.single.cosg:

import numpy as np
vessel.uns['log1p']['base'] = np.e

and then run the ov.single.cosg like:

sc.tl.dendrogram(vessel, groupby='celltype',use_rep='X_scANVI')
sc.tl.rank_genes_groups(vessel, groupby='celltype', 
                        method='wilcoxon',use_rep='X_scANVI',)
ov.single.cosg(vessel, key_added='celltype_cosg', groupby='celltype')
Starlitnightly commented 2 months ago

This looks like a bug caused by scanpy. You can run the following code before ov.single.cosg:

import numpy as np
vessel.uns['log1p']['base'] = np.e

and then run the ov.single.cosg like:

sc.tl.dendrogram(vessel, groupby='celltype',use_rep='X_scANVI')
sc.tl.rank_genes_groups(vessel, groupby='celltype', 
                        method='wilcoxon',use_rep='X_scANVI',)
ov.single.cosg(vessel, key_added='celltype_cosg', groupby='celltype')

maybe vessel.uns['log1p']['base'] set to None is better.