Closed gkaur closed 3 years ago
I think I sorted out this issue. The output from method II was accurate. Some how things get messed up if I tried to batch process things on the cluster, that tried in method I.
I verified this by running BLASR on its own along with the alignment options that SMRTSV2 uses. The output was similar to that of method I.
I have been running align step on a cluster. I have reads in the form of 10 bam files for a sample. I tried aligning these with two methods: I : I ran align step on each bam separately and then merged outputs together
smrtsv2 align --batches 1 --threads 10 <single_input_bam.fofn>
II : I ran align in a single batch with all the bam files in same runsmrtsv2 align --batches 1 --threads 35 <list_of_10_input_bams.fofn>
The merged output from the first method has a size of 623GB. The single bam output from the second method has a size of 217GB. When I look at the header I see relevant read tags from all 10 input files. Both the runs completed successfully.
I am wondering are there any BLASR alignment parameters being specified with SMRTSV2 that are causing this two happen. Are read alignments not independent of each other?
Any help will be much appreciated!