FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
188 stars 75 forks source link

ERR031857 #27

Closed messersc closed 8 years ago

messersc commented 8 years ago

One of the samples not correctly predicted by OptiType is ERR031857, where A02:06 is misclassified as A02:01.

Even when expanding the results (-e 5), the correct solution is not found:

    A1  A2  B1  B2  C1  C2  Reads   Objective                                                                                        
0   A*02:01 A*11:01 B*07:05 B*54:01 C*07:02 C*01:02 413 394.4049999999999
1   A*02:01 A*11:01 B*07:05 B*55:02 C*07:02 C*01:02 407 388.6749999999999
2   A*02:01 A*11:01 B*54:01 B*07:02 C*07:02 C*01:02 400 381.97999999999985
3   A*02:01 A*11:01 B*55:02 B*07:02 C*07:02 C*01:02 394 376.24999999999983
4   A*02:01 A*11:01 B*07:05 B*55:04 C*07:02 C*01:02 392 374.34000000000003

Curiously, e.g. Major et al. (2013) were able to predict the correct HLA types.

Does anybody have an idea why this happens? (I would be interested if Optitype2 can handle this case.)

andras86 commented 8 years ago

I will have to do some proper digging here. My first guess was the distinguishing motif between A_02:01 and A_02:06 at the beginning of exon2 is present on the other allele A_11:01 which shadows A_02:06's edge over 02:01. If this was the case I'd expect the enumeration to find 02:06 immediately which it doesn't, so it must be trickier than that. BTW, does it have coverage at the beginning of exon2 at all? Because if not, A*02:06 might have been thrown out in the pre-solving pruning (which we will cut back a lot on in OT2).

OT2 attempts to sort out these things although it's hard and as we discussed a while ago not always possible. And this may be something else entirely. I will look into it some time later. Until then if you could e-mail me the coverage plot I'd be grateful.

messersc commented 8 years ago

You are right and there is no coverage at the region that is distinguishing between the two alleles (position 154 and 158 of the references if I counted correctly). So this is not a bug.

Still wondering how they came to the right conclusion... https://figshare.com/articles/_HLA_Typing_from_1000_Genomes_Whole_Genome_and_Whole_Exome_Illumina_Data_/843210