Closed msilver727 closed 1 month ago
Thank you for reporting the issue. We are investigating it.
Based on your input VCF, your sample is indeed 4/4. You are right that rs3892097 is the core allele-defining variant for 4. However, rs1135840 and rs1065852 are also allele-defining positions for all 4 sub-alleles but one and because of the exceptional 4 sub-allele, rs1135840 and rs1065852 are not included in the PharmVar 4 core (see PharmVar CYP2D6 gene page and PharmVar criteria page). The CPIC/PharmGKB CYP2D6 allele definition table is based on PharmVar core alleles. In addition, variants that are present in the core allele for one star allele but also present in some but not all suballeles of another star allele are included in the other star allele as ambiguous changes using the IUPAC nucleotide code (see the Notes tab on the CYP2D6 Allele Definition Table). Currently, PharmVar defines 175 CYP2D6 alleles. rs1135840 and rs1065852 are part of 21 CYP2D6 core star alleles and are further included in suballeles of 3 star alleles (4, 56, and *150).
Therefore, CYP2D6 *10 is defined by the presence of rs1135840 and rs1065852 and the absence of variants defining other star alleles.
For more on how PharmCAT determines a PGx allele, please visit https://pharmcat.org/methods/NamedAlleleMatcher-101/.
For more on calling CYP2D6 from VCF including limitations (e.g., structural and copy number variation have a large influence on inferring CYP2D6 phenotype, but are beyond the scope of what can be called from SNPs or INDELs in a VCF file), please visit https://pharmcat.org/using/Calling-CYP2D6/.
Do you want to request a feature or report a bug? bug
What is the current behavior? For the provided sample, PharmCAT is calling the CYP2D6 diplotype as
*4/*4
, but in addition to being homozygous for the tag SNP of 4 (rs3892097 chr22:42128945:C/T), the subject is also homozygous for both tag SNPs of 10 (rs1135840 chr22:42126611:C>G and rs1065852 chr22:42130692:G>A). Therefore, I believe the more appropriate genotype call would be*4+*10/*4+*10
.If the current behavior is a bug, please provide the steps to reproduce and, if possible, your example input data via a Gist or similar. I have attached the VCF for the subject with only the lines corresponding to CYP2D6. I ran using the latest jar file from PharmCAT (v2.14.0) with passing the attached VCF and using
--research combinations,cyp2d6
:What is the expected behavior? The pipeline should be calling this subject as 4+10/4+10.
What is the motivation / use case for changing the behavior? More accurate genotype calling.
Please tell us about your environment:
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.) TEST_SUBJECT_VCF.zip