bioinfo-ibms-pumc / SCSA

SCSA: cell type annotation for single-cell RNA-seq data
GNU General Public License v3.0
80 stars 15 forks source link

ValueError #9

Open leeyoyohku opened 3 years ago

leeyoyohku commented 3 years ago

Hi, SCSA team. I'm running SCSA from a Scanpy output, but found most clusters were 'endothelial' with '?' in the Type column. I got the output by reanalysing a published lung dataset (PRJEB31843). As the published project provided a processed h5ad file with annotated cell types (named 'Celltypes_updated_July_2020'), I tried to reproduce the annotation, which should look similar to the original one. I used the following code in Python3

sc.tl.rank_genes_groups(adata, 'Celltypes_updated_July_2020') result = adata.uns['rank_genes_groups'] groups = result['names'].dtype.names dat = pd.DataFrame({group + '_' + key[:1]: result[key][group] for group in groups for key in ['names', 'logfoldchanges','scores','pvals']}) dat.to_csv("lung_ori.csv")

and run SCSA in the terminal python SCSA.py -d whole.db -i /home/yoyolab/0_Public_data/1_scRNA/1_Lung/2_PRJEB31843/scanpy_lung_ori.csv -s scanpy -E -f1.5 -p 0.01 -o lung_ori -m txt but to get the following ValueError Version V1.1 [2020/07/03] DB load: 47347 3 3 48257 37440 Namespace(Gensymbol=True, MarkerDB=None, celltype='normal', cluster='all', db='whole.db', foldchange=1.5, input='/home/yoyolab/0_Public_data/1_scRNA/1_Lung/2_PRJEB31843/scanpy_lung_ori.csv', list_tissue=False, noprint=False, norefdb=False, outfmt='txt', output='lung_ori', pvalue=0.01, source='scanpy', species='Human', target='cellmarker', tissue='All', weight=100.0) Version V1.1 [2020/07/03] DB load: 47347 3 3 48257 37440 load markers: 45409 Traceback (most recent call last): File "SCSA.py", line 1277, in <module> p.run_cmd(args) File "SCSA.py", line 1243, in run_cmd outs = anno.run_detail_cmd() File "SCSA.py", line 1171, in run_detail_cmd outs = self.calcu_scanpy_group(self.args.input,self.args.Gensymbol) File "SCSA.py", line 477, in calcu_scanpy_group k,v = c.split("_") ValueError: too many values to unpack (expected 2)

Could you please suggest how I could get a more accurate result instead of my current reanalysed output, and also how to resolve the ValueError? Many thanks!

bioinfo-ibms-pumc commented 3 years ago

Sorry for the inconvinence. Could you please paste the title of your file "lung_ori.csv"? I'll check it. Also, you can download the sample file "scanpy_pbmc_3k.csv" to compare with your file and check the format error by yourself.