Closed anh151 closed 1 year ago
For the score, the Named Allele matcher counts every position of the reference allele and for the other alleles the defining positions. For scenarios where the included variants in an unphased sample can be matched to one allele and the second allele is reference, this will result in a higher score compared to the possibility that the same variants also match to two non-reference star alleles.
CYP2C19 is special in the sense that 1 (historically reference allele) had several sub alleles for which just one, CYP2C191A and in PharmVar 1.001, matched the Reference Sequence (RefSeq), while all other 1 sub alleles had the variant at this position. The RefSeq has 80161A (c.991A) while nearly all CYP2C19 star alleles, including 1 sub alleles have 80161G (c.991G). In June 2020 PharmVar introduced 38 as star allele to match the CYP2C19 RefSeq and the CYP2C19*1 core allele is now defined by the amino acid p.I331V.
Prior to this CYP2C191 was the designated reference allele. We kept the behavior that for CYP2C191 (in addition to 38) all positions are counted when matching. Therefore, 1/2 gets a higher score compared to 2/11. When you set the option to get an output of all matches, 2/*11 will be included.
PharmCAT version 2.4.0
We previously discussed how different scenarios are handled when it comes to assuming the phasing of variants when the input data is unphased. I just want to get some clarity and make sure the behavior in these two examples is intentional.
Example 1: SLCO1B1 chr12:21176804A>G: 1/1 chr12:21176879C>A: 0/1 chr12:21196951A>G: 0/1 chr12:21206031A>G: 0/1
PharmCAT: 37/43 or 14/44
Example 2: CYP2C19
chr10:94775367A>G: 0/1 chr10:94775507G>A: 0/1 chr10:94781859G>A: 0/1 chr10:94842866A>G: 1/1
PharmCAT: 1/2
To me both examples have the same ambiguity in phasing and should be handled the same. Is there a reason for the difference?
Thanks,
Andrew