amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
288 stars 66 forks source link

Invalid SM tag in sam record #54

Open kevyin opened 9 years ago

kevyin commented 9 years ago

SNAP seems to put some @RG-* tags into the alignment section This shouldn't happen for SM and PL tags as they are either not defined or have different meanings.

According to the SAMFile Spec @RG-SM is the Sample name SNAP will put this into the alignment section as SM:Z:SomeSampleName However SM is defined in the SAMFile spec as "SM i Template-independent mapping quality" which has type "i" (Singed 32-bit integer). Tools such as qProfiler trying to convert the string into integer will fail.

Another tag is @RG-PL Which appears as PL:Z:SomePlatform But this is actually not a defined TAG in the spec

An Example record that fails in qProfiler

HWI-D00119:67:H7NV5ADXX:1:2107:1489:93313   77  *   0   0   *   *   0   0   ATAGAAGAGTGCCGCTCCGAGTTGGAGGTGCTGCAGCAGAGGCGGAATCCGGGGGAACGGGAATGGGGAAACCTGCCCTCCTTGTTCGAAGCCGTCAGCAA   ?@@DDDBA:2<DFHA<GI<GH???C9DBH?<9?<D;?98B#############################################################   LB:Z:library    PG:Z:SNAP   RG:Z:1  NM:i:0  SM:Z:jda    PU:Z:AFAKEFC1D_2_jda_Human_NULL_TEST_jd_chr21