eldariont / svim

Structural Variant Identification Method using Long Reads
GNU General Public License v3.0
154 stars 19 forks source link

Long Walltime? #7

Closed dantaki closed 5 years ago

dantaki commented 5 years ago

I have a 7x WGS ONP genome and my jobs crashed after 64hours. Is this normal? I started with raw FASTQs and using default parameters except for the nanopore option.

eldariont commented 5 years ago

Hi Danny, what you describe doesn't sound normal. Aligning and analyzing 7x (human) coverage of reads shouldn't take that long. Can you share more information about the crash? What was your precise command and how did the command line output look like? Can you share the log file that SVIM created in the working directory you gave? Cheers, David

dantaki commented 5 years ago
$ svim reads --nanopore /scratch/nanopore/proband_svim/ /storage/nanopore/proband.onp.fq /storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa

Log file:

2018-12-14 15:43:05,930 [INFO   ]  ****************** Start SVIM, version 0.4.2 ******************
2018-12-14 15:43:05,931 [INFO   ]  CMD: python3 /home/dantaki/anaconda3/bin/svim reads --nanopore /scratch/nanopore/proband_svim/ /storage/nanopore/proband.onp.fq /storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
2018-12-14 15:43:05,931 [INFO   ]  WORKING DIR: /scratch/nanopore/proband_svim
2018-12-14 15:43:05,931 [INFO   ]  PARAMETER: sub, VALUE: reads
2018-12-14 15:43:05,931 [INFO   ]  PARAMETER: working_dir, VALUE: /scratch/nanopore/proband_svim/
2018-12-14 15:43:05,931 [INFO   ]  PARAMETER: reads, VALUE: /storage/nanopore/proband.onp.fq
2018-12-14 15:43:05,931 [INFO   ]  PARAMETER: genome, VALUE:/storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
2018-12-14 15:43:05,931 [INFO   ]  PARAMETER: min_mapq, VALUE: 20
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: min_sv_size, VALUE: 40
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: max_sv_size, VALUE: 100000
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: skip_indel, VALUE: False
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: skip_segment, VALUE: False
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: cores, VALUE: 1
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: aligner, VALUE: ngmlr
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: nanopore, VALUE: True
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: segment_gap_tolerance, VALUE: 10
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: segment_overlap_tolerance, VALUE: 5
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: partition_max_distance, VALUE: 5000
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: distance_normalizer, VALUE: 900
2018-12-14 15:43:05,932 [INFO   ]  PARAMETER: cluster_max_distance, VALUE: 0.7
2018-12-14 15:43:05,933 [INFO   ]  PARAMETER: del_ins_dup_max_distance, VALUE: 1.0
2018-12-14 15:43:05,933 [INFO   ]  PARAMETER: trans_destination_partition_max_distance, VALUE: 1000
2018-12-14 15:43:05,933 [INFO   ]  PARAMETER: trans_partition_max_distance, VALUE: 200
2018-12-14 15:43:05,933 [INFO   ]  PARAMETER: trans_sv_max_distance, VALUE: 500
2018-12-14 15:43:05,933 [INFO   ]  PARAMETER: distance_metric, VALUE: sl
2018-12-14 15:43:05,933 [INFO   ]  ****************** STEP 1: COLLECT ******************
2018-12-14 15:43:05,933 [INFO   ]  MODE: reads
2018-12-14 15:43:05,933 [INFO   ]  INPUT: /storage/nanopore/proband.onp.fq
2018-12-14 15:43:05,933 [INFO   ]  GENOME: /storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa
2018-12-14 15:43:05,933 [INFO   ]  Recognized reads file as FASTQ format.
2018-12-14 15:43:05,956 [INFO   ]  Starting alignment pipeline..

Looking at the STDERR. The above is the same but it's using ngmlr to align the reads

ngmlr 0.2.7 (build: Jul  2 2018 10:32:15, start: 2018-12-14.15:43:05)
Contact: philipp.rescheneder@univie.ac.at
Writing output (SAM) to stdout
Reading encoded reference from /storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa-enc.2.ngm
Reading 3220 Mbp from disk took 0.68s
Reading reference index from /storage/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa-ht-13-2.2.ngm
Reading from disk took 3.89s
Opening query file /storage/nanopore/proband.onp.fq
Mapping reads...
Processed: 28 (0.82), R/S: 14.00, RL: 4954, Time: 31.75 2.75 37.25, Align: 0.97, 685, 0.95

...

ESC[AESC[2KProcessed: 1278196 (0.80), R/S: 5.55, RL: 7682, Time: 6.00 3.00 29.00, Align: 1.00, 392, 0.94
ESC[AESC[2KProcessed: 1278296 (0.80), R/S: 5.55, RL: 7681, Time: 6.00 3.00 29.00, Align: 1.00, 392, 0.94
=>> PBS: job killed: walltime 230443 exceeded limit 230400

Thanks for the help

eldariont commented 5 years ago

Okay, thanks. That looks like your cluster scheduling system killed the job because it ran longer than the allowed time of 64 hours. But the problem isn't SVIM (it hasn't even started analyzing alignments yet) but the alignment of the reads with NGMLR. The output of NGMLR in the stderr tells us that 1278296 reads have been aligned in the 64h with an average rate of 5.55 reads per second (pretty slow..) and an average read length of 7681bp.

I think you have two options: either allowing NGMLR to use more cores (e.g. with svim reads --cores 10) or using minimap2 for alignment as it is much faster than NGMLR (svim --aligner minimap2).

dantaki commented 5 years ago

Ok makes sense. Thanks!