Closed messersc closed 8 years ago
I will have to do some proper digging here. My first guess was the distinguishing motif between A_02:01 and A_02:06 at the beginning of exon2 is present on the other allele A_11:01 which shadows A_02:06's edge over 02:01. If this was the case I'd expect the enumeration to find 02:06 immediately which it doesn't, so it must be trickier than that. BTW, does it have coverage at the beginning of exon2 at all? Because if not, A*02:06 might have been thrown out in the pre-solving pruning (which we will cut back a lot on in OT2).
OT2 attempts to sort out these things although it's hard and as we discussed a while ago not always possible. And this may be something else entirely. I will look into it some time later. Until then if you could e-mail me the coverage plot I'd be grateful.
You are right and there is no coverage at the region that is distinguishing between the two alleles (position 154 and 158 of the references if I counted correctly). So this is not a bug.
Still wondering how they came to the right conclusion... https://figshare.com/articles/_HLA_Typing_from_1000_Genomes_Whole_Genome_and_Whole_Exome_Illumina_Data_/843210
One of the samples not correctly predicted by OptiType is ERR031857, where A02:06 is misclassified as A02:01.
Even when expanding the results (
-e 5
), the correct solution is not found:Curiously, e.g. Major et al. (2013) were able to predict the correct HLA types.
Does anybody have an idea why this happens? (I would be interested if Optitype2 can handle this case.)