BigDataBiology / SemiBin

SemiBin: metagenomics binning with self-supervised deep learning
https://semibin.rtfd.io/
114 stars 10 forks source link

OSError: [Errno 12] Cannot allocate memory #106

Closed joeyclancy closed 1 year ago

joeyclancy commented 2 years ago

Hey, I have met some new problems. When I run with "SemiBin train -i ${wd}/*fa --data ${wd}/output/data.csv --data-split ${wd}/output/data_split.csv -c ${wd}/output/cannot/cannot.txt -o ${wd}/output --mode single", some errors occured:

`2022-08-09 02:24:34,983 - Setting number of CPUs to 28 2022-08-09 02:24:35,034 - Running with GPU. 2022-08-09 02:24:35,680 - Start training from one sample. 2022-08-09 02:24:41,307 - Training model... 0%| | 0/20 [00:00<?, ?it/s]Traceback (most recent call last): File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/error.py", line 18, in call result = self.callable(*args, kwargs) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/utils.py", line 283, in prodigal subprocess.check_call( File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 368, in check_call retcode = call(*popenargs, *kwargs) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 349, in call with Popen(popenargs, kwargs) as p: File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 951, in init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 1754, in _execute_child self.pid = _posixsubprocess.fork_exec( OSError: [Errno 12] Cannot allocate memory

Traceback (most recent call last): File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/error.py", line 18, in call result = self.callable(*args, kwargs) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/utils.py", line 283, in prodigal subprocess.check_call( File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 368, in check_call retcode = call(*popenargs, *kwargs) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 349, in call with Popen(popenargs, kwargs) as p: File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 951, in init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/subprocess.py", line 1754, in _execute_child self.pid = _posixsubprocess.fork_exec( OSError: [Errno 12] Cannot allocate memory ...`

I installed the package via bioconda (SemiBin v1.0.3) and all dependencies are okay. I check the memory use (nvidia-smi) when I run the same command again, and seems there is free memory left.

joeyclancy commented 2 years ago

Find same errors when I run with CPU

psj1997 commented 2 years ago

Hi, can you checkm if there are enpugh CPU memory?

joeyclancy commented 2 years ago

yes, the gpu/cpu have near 70% memory left. somehow, trainning process have finished with gpu succesfully, while similar errors were occured when I run the binning process (same as cpu): Traceback (most recent call last): File "/public/home/hymeta/anaconda3/envs/semibin/bin/SemiBin", line 10, in sys.exit(main()) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/main.py", line 1051, in main binning(logger, args.num_process, args.data, args.max_edges, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/main.py", line 820, in binning cluster( File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/cluster.py", line 174, in cluster result = run_infomap(g, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/site-packages/SemiBin/cluster.py", line 20, in run_infomap with multiprocessing.Pool(num_process) as p: File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/public/home/hymeta/anaconda3/envs/semibin/lib/python3.9/multiprocessing/popen_fork.py", line 66, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

still, enough memory left. I find that the threads are far more than the number I have set, seems its useless to set -p --processes --threads -t (when I set the gpu-mode, all cpus were also loaded)

my commond is 'SemiBin bin -i ${wd}/*fa --model ${wd}/output/model.h5 --data ${wd}/output/data.csv -o ${wd}/output --engine gpu'

psj1997 commented 2 years ago

If you set -t 1, you will still meet this error?

Sincerely Shaojun

joeyclancy commented 2 years ago

Thanks Shaojun, no error occured. Binning runs well unless I set -t >3.

Sincerely Joey

huizhen-yan commented 1 year ago

Hi, I got the same error even I set -t 1.

`(SemiBin) [zhanglab@biostack SemiBinyhzm7]$ SemiBin single_easy_bin -i /project/project_data/YHZ/yhzm7/yhzm7_megahit/final.contigs.fa -b /project/project_data/YHZ/yhzm7/yhzm7.S4R1.sort.bam --environment ocean -t 1 -r /project/project_data/YHZ/yhzm7/SemiBinyhzm7/SemiBinGTDB -o semibin_singleS4R1_output 2022-10-23 20:37:05 biostack SemiBin[673793] INFO Did not detect GPU, using CPU. 2022-10-23 20:37:13 biostack SemiBin[673793] INFO Generate training data. 2022-10-23 20:37:13 biostack SemiBin[673793] INFO Calculating coverage for every sample. 2022-10-23 20:47:06 biostack SemiBin[673793] INFO Processed:/project/project_data/YHZ/yhzm7/yhzm7.S4R1.sort.bam 2022-10-23 21:25:31 biostack SemiBin[673793] INFO Start binning.

2022-10-24 07:09:16 biostack SemiBin[673793] INFO Number of bins prior to reclustering: 798 Traceback (most recent call last): File "/biostack/bioconda/envs/SemiBin/bin/SemiBin", line 10, in sys.exit(main()) File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/main.py", line 1178, in main single_easy_binning( File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/main.py", line 956, in single_easy_binning binning(logger, args.num_process, data_path, File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/main.py", line 885, in binning cluster( File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/cluster.py", line 228, in cluster seeds = cal_num_bins( File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/utils.py", line 382, in cal_num_bins contig_output = run_orffinder(fasta_path, num_process, tdir) File "/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/utils.py", line 316, in run_prodigal with LoggingPool(num_process) if num_process != 0 else LoggingPool() as pool: File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/pool.py", line 215, in init self._repopulate_pool() File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/pool.py", line 329, in _repopulate_pool_static w.start() File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/context.py", line 281, in _Popen return Popen(process_obj) File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/biostack/bioconda/envs/SemiBin/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory (SemiBin) [zhanglab@biostack SemiBinyhzm7]$ `

luispedro commented 1 year ago

After looking at this in detail, I think the code should not call multiprocessing and instead just manage calling the different prodigal processes directly.

As a stop gap, using threading would probably solve the immediate problem as well with minimal code changes (and the extra computational cost of creating threads is negligible)

huizhen-yan commented 1 year ago

I didn't fully get you about "just calling different prodigal processes directly". But I re-run SemiBin using single sample one by one, and it went well.

`(SemiBin) [zhanglab@biostack SemiBinS4R1]$ SemiBin single_easy_bin -i /project/project_data/YHZ/yhzm7/megahit_S4R1/megahit_S4R1/final.contigs_S4R1.fa -b /project/project_data/YHZ/yhzm7/SemiBinS4R1/S4R1.sort.bam --environment ocean -t 1 -r /project/project_data/YHZ/yhzm7/SemiBinyhzm7/SemiBinGTDB -o semibin_singleS4R1_output 2022-10-25 10:48:42 biostack SemiBin[918925] INFO Did not detect GPU, using CPU. 2022-10-25 10:48:42 biostack SemiBin[918925] INFO Generate training data. 2022-10-25 10:48:43 biostack SemiBin[918925] INFO Calculating coverage for every sample. 2022-10-25 10:56:18 biostack SemiBin[918925] INFO Processed:/project/project_data/YHZ/yhzm7/SemiBinS4R1/S4R1.sort.bam 2022-10-25 11:00:56 biostack SemiBin[918925] INFO Start binning. 2022-10-25 11:08:02 biostack SemiBin[918925] INFO Number of bins prior to reclustering: 162

/biostack/bioconda/envs/SemiBin/lib/python3.10/site-packages/SemiBin/utils.py:266: FutureWarning: In a future version of pandas all arguments of StringMethods.split except for the argument 'pat' will be keyword-only. data['bin'] = data['orf'].str.split('.',0, expand=True)[0] 2022-10-25 11:51:38 biostack SemiBin[918925] INFO Number of bins after reclustering: 170 2022-10-25 11:51:38 biostack SemiBin[918925] INFO Binning finish. If you find SemiBin useful, please cite: Pan, S., Zhu, C., Zhao, XM. et al. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat Commun 13, 2326 (2022). https://doi.org/10.1038/s41467-022-29843-y. (SemiBin) [zhanglab@biostack SemiBinS4R1]$`

Genome completion and contamination was estimated using CheckM v1.1.3. I found that the quality of archaeal genomes (my research object) did not improve significantly compared to MetaBAT2. So I wonder if SemiBin could specify that only archaeal reference genomes from GTDB are used to train marine samples, will this help improve archaeal quality?

luispedro commented 1 year ago

Sorry for the confusion, this was meant for @psj1997 (and I should have tagged him explicitly). Sorry to hear that there was no significant improvements; I am not sure that using only archaeal genomes would help, though. There is maybe just not enough in the databases to help.

If you can afford the computational cost, using multiple samples does seem to help produce more HQ bins.

@psj1997 : In line https://github.com/BigDataBiology/SemiBin/blob/aaa7b2bfc8a13cc86337d2cbb6c36ccae45b2127/SemiBin/utils.py#L316, instead of a multiprocessing.Pool this should just be a loop creating subprocess.Popen objects and then waiting for each one. There is no need for multiprocessing (which is creating problems)

huizhen-yan commented 1 year ago

Thanks for your reply. I need to complete the project based on the available methods first, then I will try to find another server supporting GPU. More importantly, I will test SemiBin binning by training new models for specific microbial taxa.