ExaScience / elprep

elPrep: a high-performance tool for analyzing sequence alignment/map files in sequencing pipelines.
Other
287 stars 40 forks source link

Proper running using sam input file? #59

Open desmodus1984 opened 2 years ago

desmodus1984 commented 2 years ago

Hi,

I read that elprep can work with .sam file input, and since it d oes coordinate sorting, I just mapped my reads to the elfasta converted reference, and used the first .sam file as input. I am a little confused/concerned whether the final vcf would be correct due to the job log. I used the following code: elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --nr-of-threads 28 --tmp-path $TMPDIR \ --mark-duplicates --mark-optical-duplicates AL91.metrics \ --sorting-order coordinate \ --bqsr AL91.recal \ --reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta \ --haplotypecaller AL91.vcf.gz

and the log- I thought for proper variant calling it had to first convert/sort the .sam and then split. It has been ~16 hours and the only output is the AL91.recal and not a AL91.metrics out.

Here is the log. elprep version 5.1.1 compiled with go1.16.7 - see http://github.com/exascience/elprep for more information.

2022/01/20 20:44:07 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2022-01-20-20-44-07-250202704-EST.log 2022/01/20 20:44:07 Command line: [elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --nr-of-threads 28 --tmp-path /tmp/slurmtmp.17532726 --mark-duplicates --mark-optical-duplicates AL91.metrics --sorting-order coordinate --bqsr AL91.recal --reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta --haplotypecaller AL91.vcf.gz] 2022/01/20 20:44:07 Executing command: elprep sfm AL91.sam AL91.output.bam --filter-unmapped-reads --mark-duplicates --mark-optical-duplicates AL91.metrics --optical-duplicates-pixel-distance 100 --bqsr AL91.recal --reference /users/PHS0338/jpac1984/data/myse-hapog.elfasta --quantize-levels 0 --max-cycle 500 --haplotypecaller AL91.vcf.gz --sorting-order coordinate --nr-of-threads 28 --tmp-path /tmp/slurmtmp.17532726 --intermediate-files-output-prefix AL91 --intermediate-files-output-type sam 2022/01/20 20:44:07 Splitting... 2022/01/20 21:01:22 Filtering (phase 1)... 2022/01/20 21:29:00 Filtering (phase 2) and variant calling...

Hopefully, I am doing the proper procedure and not wasting time.

Best regards;

Juan

caherzee commented 2 years ago

Hi,

I do not see something incorrect wrt to the elprep command. I would maybe try to add the following option: --intermediate-files-output-type bam. Currently, the intermediate files are sam files, and if your input file is very large, this may slow down processing.

You may also want to add the --timed option to get more output where time is going.

With regard to the missing metrics file:

It is unclear to me if you obtained the above log for an elPrep job that finished running or if the job was still running at that time?

Thanks,

Charlotte