WGLab / SeqMule

Automated human exome/genome variants detection from FASTQ files
http://seqmule.usc.edu
Other
22 stars 22 forks source link

picard versus samtools rmdup #36

Closed kaichop closed 9 years ago

kaichop commented 9 years ago

do some research on how much time each procedure takes on whole-genome data

If picard is too slow, we should just switch to samtools to further improve speed; it seems that picard takes a lot of time...

yunfeiguo commented 9 years ago

Shown below is breakdown of time consumption for a whole genome data set (HiSeq-J from AllSeq.com). You are right, picard removal of PCR duplicates is slow. I'll try SAMtools, but previously SAMtools' result is not accepted by GATK.

Alignment stats calculation is also slow, I want to write a multi-threaded version but haven't got around to it.

 QC assesment on BAM files:  151.9 min
 Remove duplicates:  842.8 min
 Filter BAM file:  133.7 min
 Index BAM file:  17.8 min
 GATK realignment:  2.3 min
 Apply realignment:  20.2 min
 Index BAM files:  0.5 min
 GATK HaplotypeCaller variant calling:  7.8 min
 GATK variant filtering:  5.7 min
 SAMtools variant calling:  40.1 min
 Varscan variant calling:  42.8 min
 Merge split VCF:  0.4 min
 Rename VCF:  0.0 min
 Merge split VCF:  0.3 min
 Rename VCF:  0.0 min
 Extract variants in custom regions:  0.1 min
 Extract variants in custom regions:  0.1 min
 Extract variants in custom regions:  0.1 min
 Extract consensus calls:  0.4 min
 Generate QC stat:  0.0 min
 Generate Venn digram:  0.2 min
 Generate alignment and coverage stat:  482.1 min
 Generate variant stat:  0.1 min
 Remove intermediate files:  0.0 min
 Remove temporary files:  0.0 min
 Generating html report:  0.2 min
yunfeiguo commented 9 years ago

It takes 293 min for SAMtools to remove PCR duplicates. And its result is now accepted by GATK. SAMtools 1.1 cannot do rmdup at the moment. It takes 514 min for alignment stats calculation using SAMtools 1.1, apparently the new version did not improve the 'depth' subprogram.

I will replace picard rmdup with SAMtools rmdup.