BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

tgsgapcloser: line 527: 1616435 Aborted #49

Closed LiaOb21 closed 2 years ago

LiaOb21 commented 2 years ago

Hello,

Thank you so much for this useful tool. I was trying to filling the gaps of my draft assembly of Arabidopsis (ONT assembly, polished with HiFi reads using racon, and then scaffolded using 3d-dna) using HiFi reads and I got this error:

/path/to/miniconda3/bin/tgsgapcloser  --scaff  /path/to/draft_assembly.fasta --reads  /path/to/HiFi_reads.fa --output gapclosed_ne  --ne --tgstype pb --thread 32 
INFO  :   Run tgsgapcloser from /path/to/miniconda3/bin ;
          Version : 1.1.1 ;
          Release time : 2019-12-31 .

INFO  :   Parsing args starting ...
             --scaff /path/to/draft_assembly.fasta
             --reads /path/to/HiFi_reads.fa
             --output gapclosed_ne
             --ne 
             --tgstype pb
             --thread 32

INFO  :   Parsing args end .

INFO  :   Checking basic args & env ...
              -   Will not do error-correcting by --ne option
              -   TGS reads type is pb . MINIMAP2_PARAM is  -x ava-pb   MIN_IDY is 0.2 . MIN_MATCH is 200 .

INFO  :   Checking basic args & env end.

INFO  :   Step 1 , run TGSSeqSplit to split scaffolds into contigs. 

INFO  :   Step 1 , done .

INFO  :   Step 2 , skip TGSCandidate by --ne option.

INFO  :   Step 3 , skip error-correction by --ne option.

INFO  :   Step 4 , gap filling ... 
              -   Use /path/to/HiFi_reads.fa as final TGS READS input.
              -   4,1 , mapping contig into reads ... 
              -   4,2 , extra filling seq ... 
/path/to/miniconda3/bin/tgsgapcloser: line 527: 1616435 Aborted                 (core dumped) $GapCloser --ont_reads_a $FINAL_READS --contig2ont_paf $OUT_PREFIX.fill.paf --min_match=$MIN_MATCH --min_idy=$MIN_IDY --prefix $OUT_PREFIX > $OUT_PREFIX.fill.log 2>&1

I installed tgscapcloser using conda.

This is what I see in the .log file

tail gapclosed_ne.fill.log 
5609    1
5724    1
5809    1

TGSGapCloser    INFO    GMT 2022/6/10   13:51:27    :   LoadPAF finish. used wall clock : 430 seconds, cpu time : 427.496643 seconds
TGSGapCloser    INFO    GMT 2022/6/10   13:51:27    :   LoadScaffInfo start now ... 
TGSGapCloser    INFO    GMT 2022/6/10   13:51:28    :   LoadScaffInfo finish. used wall clock : 1 seconds, cpu time : 1.535442 seconds
TGSGapCloser    INFO    GMT 2022/6/10   13:51:28    :   ParseAllGap start now ... 
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

These are the files that I got in the directory:

ls -ltrh

total 11G
196 Jun 10 11:15 gapclosed_ne.seq_split.log
3.0K Jun 10 11:15 gapclosed_ne.orignial_scaff_infos
1.3K Jun 10 11:15 gapclosed_ne.name_map
118M Jun 10 11:15 gapclosed_ne.contig
3.3K Jun 10 13:43 gapclosed_ne.minimap2.04.log
11G Jun 10 13:43 gapclosed_ne.fill.paf
22K Jun 10 13:59 gapclosed_ne.fill.log

It seems to be a memory issue, but we think to have enough memory. Do you have any idea on what could be happened? I really thank you in advance.

Cheers,

Lia

cchd0001 commented 2 years ago

Hello,

This std::bad alloc is a common memory problem, as you said. Is there any other program that runs simultaneously on the same computer and consumes a lot of memory? Please re-run the program and keep an eye on the memory available.

Best regards

Lidong Guo

LiaOb21 commented 2 years ago

Hi Lidong,

Thank you so much for your reply. May I ask you which are the minimum memory requirements? I am currently running the software on a big cluster, so it seems quite strange that the memory is not enough, unless a really huge memory is required. The Arabidopsis genome is quite small, but the software crashes always at the same step. Is there anything in the script that I can try to change to solve the issue? Thank you so much in advance.

Cheers,

Lia

cchd0001 commented 2 years ago

Hi Lia,

One of our test datasets which generates a 5.5Gb xxx.paf records a 32Gb peak memory. Since your gapclosed_ne.fill.paf is 11Gb, I think the safe memory estimation is 100Gb.

If your genome size is small, then I guess the coverage of your HiFi reads is relatively large? Reducing read coverage is also an effective way to reduce memory costs.

Best regards Lidong Guo

LiaOb21 commented 2 years ago

Hi Lidong, Thank you so much, it solved the issue.

Best regards,

Lia