ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
287 stars 40 forks source link

How to call population snps/indels? #43

Closed xiekunwhy closed 3 years ago

xiekunwhy commented 3 years ago

Hi,

I have 2 questions about haplotypecaller.

1) How to excute haplotypecaller function only? I have analysis-ready bams and don't need to do any other pre-process step.

2) How to use elprep to call population snps/indels (input multiple bams and output single vcf)?

Best, Kun

pcostanza commented 3 years ago
  1. You need to call either elprep filter or elprep sfm with the input bam file, output bam file, and then the haplotypecaller option --haplotypecaller vcf-file. There are other required and optional arguments to steer the variant calling. Please check the main elPrep page and search for "--haplotypecaller vcf-file" for the section with more details. The simplest call looks like this: elprep filter input.bam output.bam --haplotypecaller variants.vcf.gzip --reference hg38.elfasta. (You also need to create the elfasta format, see the documentation for more details.)
  2. If you are talking about joint genotyping, we don't support that yet. If you need this, feel free to contact us by email.
xiekunwhy commented 3 years ago

Hi @pcostanza ,

Thank you for your answers, but I found that HaplotypeCaller in gatk4.2.0 is faster and using less memory (I can limit it by Xmx java option) than elprep sfm haplotypecaller with the same number of threads (4). So I am giving up to use elprep in my furture works. And elprep has no options to limit memory using.

Best, Kun

pcostanza commented 3 years ago

Hi @xiekunwhy ,

Yes, we know that some steps in elPrep can be slower when compared to other tools in isolation. However, elPrep's benefit comes from running multiple steps combined from a single command-line invocation, which allows elPrep to merge and fuse their computations. In that case, elPrep runs faster when compared to having to perform multiple command-line invocations of other tools one after the other, even when those are highly optimized. We have reported that in our papers on elPrep. We understand that elPrep is not always applicable, but it's worthwhile considering to use it for a complete pipeline, especially because it produces the same output.

Best, Pascal