amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

GRCh38 HLA contig names get turned into invalid @SQ lines #107

Closed fnothaft closed 7 years ago

fnothaft commented 7 years ago

If you use the hs38DH build with the HLA contigs, the HLA contig names:

>HLA-A*01:01:01:01      HLA00001 3503 bp

get turned into @SQ headerlines that fail HTSJDK's header validation:

@SQ     SN:HLA-A*01:01:01:01    HLA00001_3503_bp        LN:3503
fnothaft commented 7 years ago

This leads to invalid SAM reads as well:

Line: H06JUADXX130110:1:2106:5794:11991 99  HLA-DQB1*05:01:01:02    HLA06615_7090_bp    3929    28  250M    =   4216    480 TCTCATAAAATTGTGCCCTCTATTTTACTCCCAGTCTGTTTAAGATGAACAAATCTTACAAGGTCACATAGCTGACTGTGATATCAGTTGGACTCCAGGAAGGAGAACCTAAAGAAAAGTTCAAGTCCAAGCAGAAACCGTGATTCCTTCCGGATGATGGCTCAAGAGTGATGTTTAACTGGGATGCAACCTGCTGACCTCAGCAAATCCTAGTTATATGTATGTGTTCACATTACAGGCTCATTAGCCC  ??>>?@@?@???????>>@>@@?????>@>>>@???@@@?@?????@??>????>@??>?????>?>?????@?@>@?A@@@AA@AA@AAAB?BA@BAABB@BBABB?ABABBABBBBAAB@@AAB@ABBAABABBB?A:AACAA?>BA@A:AC@@AAA@AC@BBABA@ABA@ABAAB?BAAACAAABB@BCAABAB?AB@BAABBBAAACAAAAAAAA?@@AAAABAAB@BAAA@B@@AB@BAA?<979  PG:Z:SNAP   NM:i:0  RG:Z:FASTQ  PL:Z:Illumina   PU:Z:pu LB:Z:lb SM:Z:sm,[Ljava.lang.StackTraceElement;@196db4c8,htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Non-numeric value in POS column; Line 13902
Line: H06JUADXX130110:1:2106:5794:11991 99  HLA-DQB1*05:01:01:02    HLA06615_7090_bp    3929    28  250M    =   4216    480 TCTCATAAAATTGTGCCCTCTATTTTACTCCCAGTCTGTTTAAGATGAACAAATCTTACAAGGTCACATAGCTGACTGTGATATCAGTTGGACTCCAGGAAGGAGAACCTAAAGAAAAGTTCAAGTCCAAGCAGAAACCGTGATTCCTTCCGGATGATGGCTCAAGAGTGATGTTTAACTGGGATGCAACCTGCTGACCTCAGCAAATCCTAGTTATATGTATGTGTTCACATTACAGGCTCATTAGCCC  ??>>?@@?@???????>>@>@@?????>@>>>@???@@@?@?????@??>????>@??>?????>?>?????@?@>@?A@@@AA@AA@AAAB?BA@BAABB@BBABB?ABABBABBBBAAB@@AAB@ABBAABABBB?A:AACAA?>BA@A:AC@@AAA@AC@BBABA@ABA@ABAAB?BAAACAAABB@BCAABAB?AB@BAABBBAAACAAAAAAAA?@@AAAABAAB@BAAA@B@@AB@BAA?<979  PG:Z:SNAP   NM:i:0  RG:Z:FASTQ  PL:Z:Illumina   PU:Z:pu LB:Z:lb SM:Z:sm
fnothaft commented 7 years ago

This is my error; I should be providing the -B/-bSpace options during the index build.