Open bredeson opened 6 years ago
Ah, after another few sips of coffee it occurred to me that if ASSUME_SORTED=false
, then Picard checks the @HD
SO
tag to determine the sort order for itself. However, If I set ASSUME_SORTED=true
and attempt to run Picard on my SAMtools-sorted (naturally-sorted) BAM file, it still fails with the same error. So either way, the ASSUME_SORTED
option seems to not be working at all...
Hello, as you mentioned above, Picards assumes records sorted lexicographically so records will be sorted using String.compareTo comparator. However samtools sorts records in such way: Records that has any number, starting from current position are less than records, that has letter at this position. If there are records which both have numbers, starting from current positions, they will be compared by these number values. If records both have letters (instead of numbers) at the current position, they will be compared lexicographically but only until next numbers.
In order to support such sorting SAMRecordQueryNameComparator should be rewritten.
So does picard need this type of sorting?
Hey @denklewer,
I think that there should be an option that allows the user to specify that the reads are grouped by name (R1 and all its secondary/supplementary alignments are grouped with its R2 and all its secondary/supplementary alignments), as is usually the case from unsorted aligner BAM outputs. If Picard does not need for the reads to be strictly queryname-sorted (only grouped) for FixMateInformation to function correctly, there should be that option.
Thanks!
Hey Picard Devs,
I'm using Picard 2.17.6 and am trying to run FixMateInformation on a BAM file that I sorted by queryname with SAMtools v1.6; however, samtools interprets "queryname" as sorting queries naturally, whereas picard sorts queries lexigraphically:
In the above
queryname_nat.bam
, Picard is choking when it encounters readM02484:3:000000000-AAFE5:1:1101:2793:10567
sorting afterM02484:3:000000000-AAFE5:1:1101:2793:9990
. To demonstrate that it was a difference between sorting naturally and sorting lexigraphically, I sorted the BAM file using Unix sort:Then re-ran FixMateInformation to completion:
Also, note that the
ASSUME_SORTED
option is set tofalse
and Picard is not ignoring theSO
tag in the@HD
header (and is therefore not attempting to re-sort the BAM). Is this intended?