broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Absurdly long run time and GCLocker #132

Open NTNguyen13 opened 3 years ago

NTNguyen13 commented 3 years ago

Hi, I'm running Pilon on a genome with size of ~200MB, using 50X short read (around 140GB) for polishing. I tried pilon multiple time with different genomes of similar size, using the same short read set. Most of the time it runs fine, however, sometimes it run absurdly long. Normally, it takes around 2-3 hours for pilon polishing, however for long run time case, it takes around 1 day for just 1/6 of the file (I checked the size of the log files). For those runs, I found many warning like this in the log:

[8055.013s][warning][gc,alloc] ForkJoinPool-2-worker-3: Retried waiting for GCLocker too often allocating 917113 words
[8230.918s][warning][gc,alloc] ForkJoinPool-2-worker-61: Retried waiting for GCLocker too often allocating 1050073 wor
ds
[8230.918s][warning][gc,alloc] ForkJoinPool-2-worker-55: Retried waiting for GCLocker too often allocating 1186230 wor
ds
[8711.626s][warning][gc,alloc] ForkJoinPool-2-worker-17: Retried waiting for GCLocker too often allocating 417876 word
s
[9526.500s][warning][gc,alloc] ForkJoinPool-2-worker-15: Retried waiting for GCLocker too often allocating 1065415 wor
ds
[9574.657s][warning][gc,alloc] ForkJoinPool-2-worker-49: Retried waiting for GCLocker too often allocating 1256813 wor
ds
[9654.760s][warning][gc,alloc] ForkJoinPool-2-worker-5: Retried waiting for GCLocker too often allocating 671948 words
seq_20268_pilon:1-2000 log:

The header of log file for long run time case:

Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500
Genome: TGS.ont.1.0.fasta
Fixing snps, indels, gaps, local
Input genome size: 218041916
Scanning BAMs
TGS.sort.bam: 1862808264 reads, 834976541 filtered, 370907648 mapped, 14439530 proper, 231333666 stray, FR 50% 1870+/-1405, RF 50% 1918+/-1436, max 6227 jumps

The header of log file for 'normal' run time case:

Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500
Genome: TGS.ont.0.0.fasta
Fixing snps, indels, gaps, local
Input genome size: 218016064
Scanning BAMs
TGS.sort.bam: 1858447661 reads, 827198042 filtered, 369178380 mapped, 7534132 proper, 232707754 stray, FR 50% 1569+/-1310, RF 50% 1583+/-1324, max 5553 jumps

I used this command

java -Xmx200G -jar  $PILON --fix all \
                            --genome TGS.ont.0.0.fasta --bam TGS.sort.bam \
                            --output TGS.ont.0.0.pilon --outdir ./ \
                            --diploid  --threads 32 > TGS.0.0.pilon.log

Could you please help me to find what causes this behavior? Thank you very much.

sfbayman commented 1 year ago

@NTNguyen13 hi, did you find the root cause for the messages "Retried waiting for GCLocker too often allocating"?