MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
65 stars 24 forks source link

再次请教技术老师,runRNA步骤大概需要使用多少内存呢? #44

Closed tyyl622 closed 9 months ago

tyyl622 commented 9 months ago

技术老师您好,这是我的运行命令 raw_dir=/public/home/tangy/Brain/0.rawdat data=/public/home/tangy/Brain/database/ soft=/public/home/tangy/miniconda3/bin/DNBelab_C_Series_HT_scRNA-analysis-software dnbc4=/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4tools singularity exec ${soft}/dnbc4tools.sif ${dnbc4} rna run \ --cDNAfastq1 ${raw_dir}/H3HIP/7777-1-231101/cDNA-7777-1-231101/FP270003949_L01_66_1.fq.gz,${raw_dir}/H3HIP/7777-1-231101/cDNA-7777-1-231101/FP270003950_L01_66_1.fq.gz \ --cDNAfastq2 ${raw_dir}/H3HIP/7777-1-231101/cDNA-7777-1-231101/FP270003949_L01_66_2.fq.gz,${raw_dir}/H3HIP/7777-1-231101/cDNA-7777-1-231101/FP270003950_L01_66_2.fq.gz \ --oligofastq1 ${raw_dir}/H3HIP/7777-1-231101/oligo-7777-1-231101/DP8480001966BR_L01_87_1.fq.gz \ --oligofastq2 ${raw_dir}/H3HIP/7777-1-231101/oligo-7777-1-231101/DP8480001966BR_L01_87_2.fq.gz \ --genomeDir ${data} \ --name H3HIP_1 --threads 20 \ --outdir /public/home/tangy/Brain/output 我们服务器的内存大概是400G+,提交该任务后观察到可用内存在持续减少,好像缓冲区的内存也无法释放,运行到最后就会提示无法分配内存,错误日志和输出如下所示,请问是因为我们服务器的原因,还是说有什么其他优化内存占用的方法呢?

Error: Matplotlib is building the font cache; this may take a moment. Traceback (most recent call last): File "/opt/conda/envs/dnbc4tools/bin/dnbc4rna", line 8, in sys.exit(main()) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/dnbc4rna.py", line 38, in main args.func(args) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 220, in count Count(args).run() File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 59, in run oligo_combine(f"{__root_dir}/config/oligo_type.json", File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 333, in oligo_combine process_directory('%s/temp'%outdir, whitelist1, whitelist2, whitelist1_distance, whitelist2_distance, '%s/temp'%outdir) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 78, in process_directory pool = multiprocessing.Pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init__ self._launch(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 58, in main args.func(args) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 105, in run Runpipe(args).runpipe() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 92, in runpipe start_print_cmd(pipecmd,os.path.join(self.outdir,self.name)) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 38, in start_print_cmd subprocess.check_call(arg, shell=True) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'dnbc4rna count --name H3HIP_1 --calling_method emptydrops --expectcells 3000 --forcecells 0 --minumi 1000 --threads 20 --outdir /public/home/tangy/Brain/output' returned non-zero exit status 1.

Output: output

lishuangshuang0616 commented 9 months ago

With 400GB of memory, it should be more than sufficient. Typically, 50GB of memory is ample for a single sample in mouse analysis. The issue may arise from other concurrent analyses utilizing memory. I recommend a reanalysis. Since the alignment is already completed, add --process count, analysis, report after your --outdir /public/home/tangy/Zbtb18Brain/output to initiate a fresh analysis.

tyyl622 commented 9 months ago

With 400GB of memory, it should be more than sufficient. Typically, 50GB of memory is ample for a single sample in mouse analysis. The issue may arise from other concurrent analyses utilizing memory. I recommend a reanalysis. Since the alignment is already completed, add --process count, analysis, report after your --outdir /public/home/tangy/Zbtb18Brain/output to initiate a fresh analysis.

Dear Li, I followed your suggestion and then I got the same error.....I noticed that we left about 360G memory when the task was stopped. How can I fix it? cache3

The Error was same as before: Traceback (most recent call last): File "/opt/conda/envs/dnbc4tools/bin/dnbc4rna", line 8, in sys.exit(main()) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/dnbc4rna.py", line 38, in main args.func(args) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 220, in count Count(args).run() File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 59, in run oligo_combine(f"{__root_dir}/config/oligo_type.json", File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 333, in oligo_combine process_directory('%s/temp'%outdir, whitelist1, whitelist2, whitelist1_distance, whitelist2_distance, '%s/temp'%outdir) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 78, in process_directory pool = multiprocessing.Pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init__ self._launch(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 58, in main args.func(args) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 105, in run Runpipe(args).runpipe() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 92, in runpipe start_print_cmd(pipecmd,os.path.join(self.outdir,self.name)) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 38, in start_print_cmd subprocess.check_call(arg, shell=True) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'dnbc4rna count --name H3HIP_1 --calling_method emptydrops --expectcells 3000 --forcecells 0 --minumi 1000 --threads 20 --outdir /public/home/tangy/Zbtb18Brain/output' returned non-zero exit status 1.

lishuangshuang0616 commented 9 months ago

你先用--threads 5个进程试试 @tyyl622

lishuangshuang0616 commented 9 months ago

你截图Index_sequencing.report那个文件内容我看下

tyyl622 commented 9 months ago

你截图Index_sequencing.report那个文件内容我看下

index_sequencing report

lishuangshuang0616 commented 9 months ago

如果--threads 5还有问题的话,Index_reads.fq.gz你截取100M reads数据,然后再运行run --process count,analysis,report的分析

tyyl622 commented 9 months ago

如果--threads 5还有问题的话,Index_reads.fq.gz你截取100M reads数据,然后再运行run --process count,analysis,report的分析

好的,我试一下~谢谢您~打扰您休息真的不好意思...

tyyl622 commented 9 months ago

技术老师您好。这是使用--threads 5以及原始Index_reads.fq.gz时产生的bt_log和log bt_log log 仍然是一样的错误....

然后我用截取之后的大约100M的Index_reads.fq.gz,替换掉原来大约7G的Index_reads.fq.gz,然后执行 singularity exec ${soft}/dnbc4tools.sif ${dnbc4} rna run --cDNAfastq1 ... --cDNAfastq2 ... --oligofastq1 ... --oligofastq2 ... --genomeDir ... --name ... --threads 5 --outdir ... --process count,analysis,report 还是一样的报错...fastq文件和olig文件的md5没问题。还是说我理解错您的意思了,不应该这样跑? 2023-12-14 21:05:41 Calculating bead similarity and merging beads. Traceback (most recent call last): File "/opt/conda/envs/dnbc4tools/bin/dnbc4rna", line 8, in sys.exit(main()) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/dnbc4rna.py", line 38, in main args.func(args) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 220, in count Count(args).run() File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 59, in run oligo_combine(f"{__root_dir}/config/oligo_type.json", File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 333, in oligo_combine process_directory('%s/temp'%outdir, whitelist1, whitelist2, whitelist1_distance, whitelist2_distance, '%s/temp'%outdir) File "/opt/conda/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 78, in process_directory pool = multiprocessing.Pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init__ self._launch(process_obj) File "/opt/conda/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 58, in main args.func(args) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 105, in run Runpipe(args).runpipe() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 92, in runpipe start_print_cmd(pipecmd,os.path.join(self.outdir,self.name)) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 38, in start_print_cmd subprocess.check_call(arg, shell=True) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'dnbc4rna count --name H3HIP_1 --calling_method emptydrops --expectcells 3000 --forcecells 0 --minumi 1000 --threads 5 --outdir /public/home/tangy/Brain/output' returned non-zero exit status 1.

lishuangshuang0616 commented 9 months ago

是这样跑没错,这个还报这个内存错误,不清楚集群实际分配的内存是多少,400G的内存肯定不会有问题应该。或者使用conda的版本试下?

tyyl622 commented 9 months ago

是这样跑没错,这个还报这个内存错误,不清楚集群实际分配的内存是多少,400G的内存肯定不会有问题应该。或者使用conda的版本试下?

好的,我再多尝试一下,谢谢老师!

tyyl622 commented 9 months ago

老师,我尝试用conda的版本跑了,一样的报错信息...测序数据已经检查过md5,没有问题。 我也尝试过使用--threads 1去跑,结果也是一样的.... 2023-12-18 20:26:46 Calculating bead similarity and merging beads. Traceback (most recent call last): File "/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4rna", line 8, in sys.exit(main()) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/dnbc4rna.py", line 38, in main args.func(args) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 220, in count Count(args).run() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/count.py", line 59, in run oligo_combine(f"{__root_dir}/config/oligo_type.json", File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 333, in oligo_combine process_directory('%s/temp'%outdir, whitelist1, whitelist2, whitelist1_distance, whitelist2_distance, '%s/temp'%outdir) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/src/oligo_filter.py", line 78, in process_directory pool = multiprocessing.Pool() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init__ self._launch(process_obj) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory Traceback (most recent call last): File "/public/home/tangy/miniconda3/envs/dnbc4tools/bin/dnbc4tools", line 8, in sys.exit(main()) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/dnbc4tools.py", line 58, in main args.func(args) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 105, in run Runpipe(args).runpipe() File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/rna/run.py", line 92, in runpipe start_print_cmd(pipecmd,os.path.join(self.outdir,self.name)) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/site-packages/dnbc4tools/tools/utils.py", line 38, in start_print_cmd subprocess.check_call(arg, shell=True) File "/public/home/tangy/miniconda3/envs/dnbc4tools/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'dnbc4rna count --name H3HIP_1 --calling_method emptydrops --expectcells 3000 --forcecells 0 --minumi 1000 --threads 1 --outdir /public/home/tangy/Brain/output' returned non-zero exit status 1.

不过,在报错信息里我注意到,当尝试调用python3.8的multiprocessing后就会报错....所以想请问一下dnbc4tools可以单线程跑吗?

对了老师,使用以下命令构建参考基因组时,使用--thread 10是没有问题的 $dnbc4tools rna mkref --ingtf genes.filter.gtf --fasta GRCm38.primary_assembly.genome.fa --threads 10 --species Mus_musculus

或者,老师能否提供一对儿测试序列,我跑试一下?

lishuangshuang0616 commented 9 months ago

这个是过滤oligo数据,并非比对的过程,所以和构建参考基因组无关。如果实在无法分析,可以下载2.1.0版本分析。

tyyl622 commented 9 months ago

这个是过滤oligo数据,并非比对的过程,所以和构建参考基因组无关。如果实在无法分析,可以下载2.1.0版本分析。

好的老师,我下载2.1.0版本试试。 我的意思是,构建参考基因组时,用多线程没啥问题~

tyyl622 commented 9 months ago

这个是过滤oligo数据,并非比对的过程,所以和构建参考基因组无关。如果实在无法分析,可以下载2.1.0版本分析。

老师,2.1.0版本没问题,我换成2.1.0版本,谢谢老师!