EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

align step output comparison #55

Closed gkaur closed 3 years ago

gkaur commented 4 years ago

I have been running align step on a cluster. I have reads in the form of 10 bam files for a sample. I tried aligning these with two methods: I : I ran align step on each bam separately and then merged outputs together smrtsv2 align --batches 1 --threads 10 <single_input_bam.fofn> II : I ran align in a single batch with all the bam files in same run smrtsv2 align --batches 1 --threads 35 <list_of_10_input_bams.fofn>

The merged output from the first method has a size of 623GB. The single bam output from the second method has a size of 217GB. When I look at the header I see relevant read tags from all 10 input files. Both the runs completed successfully.

I am wondering are there any BLASR alignment parameters being specified with SMRTSV2 that are causing this two happen. Are read alignments not independent of each other?

Any help will be much appreciated!

gkaur commented 4 years ago

I think I sorted out this issue. The output from method II was accurate. Some how things get messed up if I tried to batch process things on the cluster, that tried in method I.

I verified this by running BLASR on its own along with the alignment options that SMRTSV2 uses. The output was similar to that of method I.