PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

UnexpectedStateException: Multiple phenotypes for gene CYP2D6 #190

Closed fmobegi closed 3 weeks ago

fmobegi commented 1 month ago

Reporting a bug

Reporter fails in research mode for CYP2D6

https://gist.github.com/fmobegi/d711f4be1a55b3aadb92db3e6a9ee300

Generate a html report from the phenotyper JSON file

java -version
java version "17.0.8" 2023-07-18 LTS
Java(TM) SE Runtime Environment (build 17.0.8+9-LTS-211)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.8+9-LTS-211, mixed mode, sharing)
cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Log text

Pre-processing vcf file

 python preprocessor/pharmcat_vcf_preprocessor.py -vcf testfile.vcf.gz -G -s sample03_barcode11
PharmCAT VCF Preprocessor version: 2.15.0
Downloading reference FASTA.  This may take a while...

Bypass the gVCF check.

Saving output to /data/tools/PharmCAT-2.15.0

Processing testfile.vcf.gz ...
  * WARNING: "chr22:42128211 REF=CG ALT=C" does not match PharmCAT expectation of REF at "chr22:42128211 REF=C ALT=CG"
Adding back non-PGx variants at PGx positions...
* Cataloging 1064 missing positions in testfile.missing_pgx_var.vcf

Generated PharmCAT-ready VCF: /data/tools/PharmCAT-2.15.0/testfile.preprocessed.vcf.bgz

Done.
Preprocessed input VCF in 0.21 seconds

phenotyper

java -jar pharmcat-2.15.0-all.jar -vcf testfile.preprocessed.vcf.bgz -bf testfile -research cyp2d6
WARNING: CYP2D6 RESEARCH MODE ENABLED
WARNING: REPORTER MODULE NOT AVAILABLE IN RESEARCH MODE
chr22:42126624
        Duplicate entry found in VCF; this entry trumps others.
chr22:42128211
        Discarded genotype at this position because REF in VCF (CG) does not match expected reference (C)
Saving VCF warnings to /data/tools/PharmCAT-2.15.0/testfile.match_warnings.txt
Saving named allele matcher JSON results to /data/tools/PharmCAT-2.15.0/testfile.match.json
Saving phenotyper JSON results to /data/tools/PharmCAT-2.15.0/testfile.phenotype.json
Done.

reporter

java -jar pharmcat-2.15.0-all.jar -reporter -ri testfile.phenotype.json
org.pharmgkb.pharmcat.UnexpectedStateException: Multiple phenotypes for gene CYP2D6
        at org.pharmgkb.pharmcat.reporter.model.result.AnnotationReport.addGenotype(AnnotationReport.java:148)
        at org.pharmgkb.pharmcat.reporter.model.result.GuidelineReport.lambda$matchAnnotations$5(GuidelineReport.java:162)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at org.pharmgkb.pharmcat.reporter.model.result.GuidelineReport.matchAnnotations(GuidelineReport.java:162)
        at org.pharmgkb.pharmcat.reporter.model.result.GuidelineReport.<init>(GuidelineReport.java:62)
        at org.pharmgkb.pharmcat.reporter.model.result.DrugReport.<init>(DrugReport.java:81)
        at org.pharmgkb.pharmcat.reporter.ReportContext.<init>(ReportContext.java:73)
        at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:330)
        at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:198)
BinglanLi commented 1 month ago

Could you please share a de-identified test VCF file so we can reproduce the error?

markwoon commented 1 month ago

@BinglanLi Partial VCF is in gist linked above

whaleyr commented 1 month ago

This may be a regression of a previous bug, there may be a malformed diplotype object that snuck through. I'm investigating to find the particular bad data.

@fmobegi Your sample file is extremely odd and, thus, going to give extremely odd and unusable output data. What are you trying to test with this particular input? Every position is either heterozygous or homozygous ALT. That's just not good data.

fmobegi commented 1 month ago

It could be because we are merging variants from different tools to create a single vcf file. The idea is to have a combined SNPs, indels, SVs together. This is still experimental as I am building a nextflow pipeline for CYP2D6 genotyping using FASTQ files. I go through the usual steps (QC >> Mapping to chr22 >> Variant-calling >> Variant QC >> Variant Annotation). Previously, I had used PharmCAT 2.6.1 in research mode to get a html report for cyp2d6. I will try go through the steps and see if the merging and QC of variants is failing..

whaleyr commented 4 weeks ago

Yes, I think you have problems upstream from PharmCAT. I would resolve that first and then give it another run through PharmCAT.

Also, if you're using the year-old 2.6.0 version I recommend updating to the latest release for updated data and performance improvements.

markwoon commented 3 weeks ago

Please re-open this issue if you can reproduce it with the current PharmCAT release.