PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

Phasing ambiguity question #152

Closed anh151 closed 1 year ago

anh151 commented 1 year ago

PharmCAT version 2.4.0

We previously discussed how different scenarios are handled when it comes to assuming the phasing of variants when the input data is unphased. I just want to get some clarity and make sure the behavior in these two examples is intentional.

Example 1: SLCO1B1 chr12:21176804A>G: 1/1 chr12:21176879C>A: 0/1 chr12:21196951A>G: 0/1 chr12:21206031A>G: 0/1

image

PharmCAT: 37/43 or 14/44

Example 2: CYP2C19

chr10:94775367A>G: 0/1 chr10:94775507G>A: 0/1 chr10:94781859G>A: 0/1 chr10:94842866A>G: 1/1

image

PharmCAT: 1/2

To me both examples have the same ambiguity in phasing and should be handled the same. Is there a reason for the difference?

Thanks,

Andrew

katrinsangkuhl commented 1 year ago

For the score, the Named Allele matcher counts every position of the reference allele and for the other alleles the defining positions. For scenarios where the included variants in an unphased sample can be matched to one allele and the second allele is reference, this will result in a higher score compared to the possibility that the same variants also match to two non-reference star alleles.

CYP2C19 is special in the sense that 1 (historically reference allele) had several sub alleles for which just one, CYP2C191A and in PharmVar 1.001, matched the Reference Sequence (RefSeq), while all other 1 sub alleles had the variant at this position. The RefSeq has 80161A (c.991A) while nearly all CYP2C19 star alleles, including 1 sub alleles have 80161G (c.991G). In June 2020 PharmVar introduced 38 as star allele to match the CYP2C19 RefSeq and the CYP2C19*1 core allele is now defined by the amino acid p.I331V.

Prior to this CYP2C191 was the designated reference allele. We kept the behavior that for CYP2C191 (in addition to 38) all positions are counted when matching. Therefore, 1/2 gets a higher score compared to 2/11. When you set the option to get an output of all matches, 2/*11 will be included.