genecell / COSG

Accurate and fast cell marker gene identification with COSG
https://cosg.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
37 stars 6 forks source link

missing logfoldchanges in cosg #1

Open pcahan1 opened 3 years ago

pcahan1 commented 3 years ago

Hi,

Looks to be a very useful tool. I am able to run cosg, but when I try to produce a dotplot, I receive the following error. I think it is because cosg is not adding the logfoldchanges to the result. Thank you for any help.

sc.logging.print_header() scanpy==1.8.0 anndata==0.7.6 umap==0.4.6 numpy==1.21.0 scipy==1.7.0 pandas==1.2.5 scikit-learn==0.24.2 statsmodels==0.12.0 python-igraph==0.9.1 leidenalg==0.8.3 pynndescent==0.5.2

sc.pl.rank_genes_groups_dotplot(adTest,groupby='leiden',cmap='Spectral_r',n_genes=3,key='cosg', standard_scale='var') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.8/site-packages/scanpy/plotting/_tools/init.py", line 863, in rank_genes_groups_dotplot return _rank_genes_groups_plot( File "/usr/local/lib/python3.8/site-packages/scanpy/plotting/_tools/init.py", line 487, in _rank_genes_groups_plot df = rank_genes_groups_df( File "/usr/local/lib/python3.8/site-packages/scanpy/get/get.py", line 64, in rank_genes_groups_df d = [pd.DataFrame(adata.uns[key][c])[group] for c in colnames] File "/usr/local/lib/python3.8/site-packages/scanpy/get/get.py", line 64, in d = [pd.DataFrame(adata.uns[key][c])[group] for c in colnames] KeyError: 'logfoldchanges'

genecell commented 3 years ago

Hi @pcahan1,

Thank you for your interest in COSG and pointing out this bug. I have pushed a commit in which I added the logfoldchanges to the result:

import cosg as cosg
import scanpy as sc
adata = sc.datasets.pbmc68k_reduced()
cosg.cosg(adata, key_added='cosg', groupby='bulk_labels')
import pandas as pd

Check the logfoldchanges:

pd.DataFrame(adata.uns['cosg']['logfoldchanges']).head()

The output:

CD14+ Monocyte CD19+ B ... CD8+/CD45RA+ Naive Cytotoxic Dendritic 0 2.399848 7.609315 ... 4.955223 5.666845 1 5.350822 4.039919 ... 3.359031 3.815426 2 4.644826 7.454791 ... 4.160255 5.163846 3 4.276776 6.861791 ... 2.553756 3.435296 4 3.314361 5.755856 ... 3.068707 3.855522 [5 rows x 10 columns]

Previously I tested COSG on Scanpy 1.7 and there was no requirements for the key 'logfoldchanges'.

Thanks!

Best regards, Min

woshiyangsi commented 2 years ago

hi, thanks for your excellent word, i met the same error, do u kown how to solve this error?

cosg.cosg(dat, ... key_added='cosg', ... mu=1, ... n_genes_user=50, ... groupby='seurat_clusters') finished identifying marker genes by COSG sc.pl.rank_genes_groups_dotplot(dat, ... groupby='seurat_clusters', ... cmap='Spectral_r', ... standard_scale='var', ... n_genes=3, ... key='cosg', ... save="seurat_clusters_cosg_top10") Traceback (most recent call last): File "", line 1, in File "/home/scCell2/miniconda3/lib/python3.8/site-packages/scanpy/plotting/_tools/init.py", line 863, in rank_genes_groups_dotplot return _rank_genes_groups_plot( File "/home/scCell2/miniconda3/lib/python3.8/site-packages/scanpy/plotting/_tools/init.py", line 487, in _rank_genes_groups_plot df = rank_genes_groups_df( File "/home/scCell2/miniconda3/lib/python3.8/site-packages/scanpy/get/get.py", line 64, in rank_genes_groups_df d = [pd.DataFrame(adata.uns[key][c])[group] for c in colnames] File "/home/scCell2/miniconda3/lib/python3.8/site-packages/scanpy/get/get.py", line 64, in d = [pd.DataFrame(adata.uns[key][c])[group] for c in colnames] KeyError: 'logfoldchanges'

genecell commented 2 years ago

@woshiyangsi Hi, thank you for your interest! I will fix this error soon, now you can try the following codes to visualize top marker genes identified by COSG in Scanpy 1.8:

df_tmp=pd.DataFrame(adata.uns['cosg']['names'][:5,]).T
df_tmp.reindex(adata.obs['Leiden'].cat.categories)
marker_genes_list=np.ravel(df_tmp.reindex(adata.obs['Leiden'].cat.categories))
sc.pl.dotplot(adata, marker_genes_list,
             groupby='Leiden',              
             dendrogram=False,
             standard_scale='var',
             cmap='Spectral_r')

You can change 'Leiden' to 'seurat_clusters' in your codes.

Best, Min

woshiyangsi commented 1 year ago

Thanks!