MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
67 stars 24 forks source link

ERROR - similarity: main.c:369: create_index_array: Assertion `((infos)->n) > 0' failed. #94

Open yjzhang1020 opened 2 months ago

yjzhang1020 commented 2 months ago

你好,我在运行dnbc4tools rna run遇到以下报错: 运行的代码是: dnbc4tools rna run --name scRNA_pool6 --cDNAfastq1 /home/data/t020559/DNBelab_test/PRJCA021248/st1_data/HRR1445549_f1.fq.gz --cDNAfastq2 /home/data/t020559/DNBelab_test/PRJCA021248/st1_data/HRR1445549_r2.fq.gz --oligofastq1 /home/data/t020559/DNBelab_test/PRJCA021248/st1_data/HRR1445549_f1.fq.gz --oligofastq2 /home/data/t020559/DNBelab_test/PRJCA021248/st1_data/HRR1445549_f1.fq.gz --genomeDir /home/data/t020559/ref/homo/homo_gencode_dnbc4_index --threads 8 遇到的报错如下: 2024-08-12 19:36:30 Calculating bead similarity and merging beads within the same droplet. 2024-08-12 19:36:31,535 - count - ERROR - Command failed with exit code 134 2024-08-12 19:36:31,536 - count - ERROR - similarity: main.c:369: create_index_array: Assertion `((infos)->n) > 0' failed. Aborted (core dumped)

Traceback (most recent call last): File "/home/data/t020559/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 110, in main args.func(args) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 184, in count Count(args).run() File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 58, in run logging_call(similiarBeads_cmd_str,'count',self.outdir) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 128, in logging_call raise e File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 123, in logging_call output = subprocess.check_output(popenargs, shell=True, stderr=subprocess.STDOUT, universal_newlines=True) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 415, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/software/similarity -n 8 scRNA_pool6 /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs/scRNA_pool6/01.data/CB_UB_count.txt /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs/scRNA_pool6/02.count/beads.barcodes.umi100.txt /home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/config/cellbarcode/oligo_type.txt /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs/scRNA_pool6/02.count/similarity.all.csv /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs/scRNA_pool6/02.count/similarity.droplet.csv /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs/scRNA_pool6/02.count/similarity.dropletfiltered.csv' returned non-zero exit status 134. Traceback (most recent call last): File "/home/data/t020559/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 110, in main args.func(args) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 144, in run Runpipe(args).runpipe() File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 131, in runpipe start_print_cmd(pipecmd,os.path.join(self.outdir,self.name)) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 138, in start_print_cmd subprocess.check_call(arg, shell=True) File "/home/data/t020559/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/home/data/t020559/miniconda3/envs/dnbc4tools/bin/dnbc4tools rna count --name scRNA_pool6 --calling_method emptydrops --expectcells 3000 --threads 8 --outdir /home/data/t020559/DNBelab_test/PRJCA021248/st3_outs' returned non-zero exit status 1.

令我疑惑的是,同时运行了4个样本,仅有这一个样本报错。其余3个样本可以正常输出结果。请问可以看出来是哪里的问题吗?

lishuangshuang0616 commented 2 months ago

Take a screenshot of 01.data/beads_stat.txt content and the 01.data directory.

yjzhang1020 commented 2 months ago

This is the content of the first 10 lines of the file 01.data/beads_stat.txt. image.png

This is 01.data directory. image.png

yjzhang1020 commented 2 months ago

I have another question. The 'oligofastq' is a mandatory parameter, but the public dataset only provides R1 (30bp) and R2 (100bp) files. To make the program run, I have input the R1 file for both the 'oligofastq1' and 'oligofastq2' parameters. This way, the program can run normally and finish, and the output matrix can be used for subsequent analysis (only one sample failed). My question is, does the R1 library contain the oligo library information? Is it correct to run the program in this manner?

lishuangshuang0616 commented 2 months ago

There are two pairs of sequences in the public data. The one with more data is the cDNA library, and the one with less data is the oligo library. They need to be distinguished.

yjzhang1020 commented 2 months ago

The English translation of your text is:

Yes,I understand the need for distinction. For example, the public dataset only provides the HRR1445549_f1.fq.gz (30bp) file and the HRR1445549_r2.fq.gz (100bp) file. The program requires --cDNAfastq1 and --cDNAfastq2, which is easy to understand as the corresponding inputs should be the HRR1445549_f1.fq.gz (30bp) and HRR1445549_r2.fq.gz (100bp) files. However, the program also requires inputs for --oligofastq1 and --oligofastq2, and the public dataset does not provide the corresponding oligofastq files. So, I tried inputting HRR1445549_f1.fq.gz for the --oligofastq1 parameter and also HRR1445549_f1.fq.gz for the --oligofastq2 library. The program can run normally, but I want to know if this is the correct way to do it?

lishuangshuang0616 commented 2 months ago

Is it a dataset problem? Is there a URL?

lishuangshuang0616 commented 2 months ago

ref to this cngb data. https://db.cngb.org/search/sample/?q=CNP0005575

yjzhang1020 commented 2 months ago

My test data comes from: NGDC - GSA for Human. The data used is scRNA_pool5-8.

lishuangshuang0616 commented 2 months ago

I think this is not data that can be analyzed by dnbc4tools. Maybe you can refer to https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_scRNA-analysis-software

yjzhang1020 commented 1 month ago

I think this is not data that can be analyzed by dnbc4tools. Maybe you can refer to https://github.com/MGI-tech-bioinformatics/DNBelab_C_Series_scRNA-analysis-software

其实 我想了解的是对于这个流程,--oligofastq 作为必须参数,在流程中起什么作用?以及我如何判断一个数据集是否能使用此流程?

lishuangshuang0616 commented 1 month ago

MGI's single-cell RNA commercial reagents currently have two libraries, cDNA and oligo. The oligo library is used to merge multiple magnetic beads in the same droplet. If it is RNA that can be analyzed by dnbc4tools, then it has two libraries.

XiaomengXu5 commented 1 month ago

ref to this cngb data. https://db.cngb.org/search/sample/?q=CNP0005575

I checked the two sequencing data according to your link, and there is no Oligo fq file in them. Or did i miss something? So, what should my code do if I need to process this data?

$dnbc4tools rna run \ --cDNAfastq1 /path/to/E100062880_L01_11_1.fq.gz \ --cDNAfastq2 /path/to/E100062880_L01_11_1.fq.gz \ --oligofastq1 ? \ --oligofastq2 ? \ --genomeDir /database/scRNA/Mus_musculus/mm10 \ --name test --threads 10

image image image
yjzhang1020 commented 1 month ago

ref to this cngb data. https://db.cngb.org/search/sample/?q=CNP0005575

I checked the two sequencing data according to your link, and there is no Oligo fq file in them. Or did i miss something? So, what should my code do if I need to process this data?

$dnbc4tools rna run --cDNAfastq1 /path/to/E100062880_L01_11_1.fq.gz --cDNAfastq2 /path/to/E100062880_L01_11_1.fq.gz --oligofastq1 ? --oligofastq2 ? --genomeDir /database/scRNA/Mus_musculus/mm10 --name test --threads 10

image image image

Unfortunately, I also didn't find the corresponding Oligo fq for this ref data provided by the author. Therefore, I am also very confused about which data can be processed by this process and which data cannot.

lishuangshuang0616 commented 1 month ago

ref to this cngb data. https://db.cngb.org/search/sample/?q=CNP0005575

I checked the two sequencing data according to your link, and there is no Oligo fq file in them. Or did i miss something? So, what should my code do if I need to process this data?

$dnbc4tools rna run --cDNAfastq1 /path/to/E100062880_L01_11_1.fq.gz --cDNAfastq2 /path/to/E100062880_L01_11_1.fq.gz --oligofastq1 ? --oligofastq2 ? --genomeDir /database/scRNA/Mus_musculus/mm10 --name test --threads 10

image image image

barcode3 is cDNA fastq , barcode 11 is oligo fastq.