PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

Error-Least function gene cannot have more than 1 diplotype #150

Closed krukanna closed 11 months ago

krukanna commented 1 year ago

java.lang.IllegalStateException: Least function gene cannot have more than 1 diplotype: Reference/Reference; c.1129-5923C>G, c.1236G>A (HapB3)/c.1129-5923C>G, c.1236G>A (HapB3) at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.callDpyd(NamedAlleleMatcher.java:303) at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.call(NamedAlleleMatcher.java:196) at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:235) at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:166)

I've noticed this occur, when all positions in DPYD are 0/0 and I've got one missing genotype in chr1:97579893. This problem did not exist in previous versions.

BinglanLi commented 1 year ago

Thank you for reporting this error. Can you share a de-identified VCF file to demonstrate the issue?

krukanna commented 1 year ago

I've sent you 2 vcfs on email: pharmcat@pharmgkb.org I also tested the latest version 2.7.0 and there is additional error - also in DPYD, but in another samples. Sample 1 - works correct in 2.5.0, not in 2.6.0 and 2.7.0, error:

java.lang.IllegalStateException: Least function gene cannot have more than 1 diplotype: Reference/Reference; c.1129-5923C>G, c.1236G>A (HapB3)/c.1129-5923C>G, c.1236G>A (HapB3) at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.callDpyd(NamedAlleleMatcher.java:303) at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.call(NamedAlleleMatcher.java:196) at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:235) at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:166)

Sample 2 - works correct in 2.5.0 and 2.6.0, not in 2.7.0, error:

java.lang.RuntimeException: Multiple diplotypes found for DPYD at org.pharmgkb.pharmcat.phenotype.model.GenePhenotype.findDiplotype(GenePhenotype.java:126) at org.pharmgkb.pharmcat.reporter.model.result.Diplotype.annotateDiplotype(Diplotype.java:512) at org.pharmgkb.pharmcat.reporter.model.result.Diplotype.(Diplotype.java:108) at org.pharmgkb.pharmcat.reporter.caller.DpydCaller.inferUnphasedDiplotype(DpydCaller.java:126) at org.pharmgkb.pharmcat.reporter.caller.DpydCaller.inferFromHaplotypeMatches(DpydCaller.java:85) at org.pharmgkb.pharmcat.reporter.model.result.GeneReport.(GeneReport.java:200) at org.pharmgkb.pharmcat.phenotype.Phenotyper.initialize(Phenotyper.java:76) at org.pharmgkb.pharmcat.phenotype.Phenotyper.(Phenotyper.java:62) at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:303) at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:166)

krukanna commented 1 year ago

Hi, did you get vcf files to test?

hudja commented 1 year ago

Hi, I have the same problem here in 2.7.0 with both DPYD gene and Least function gene cannot have more than 1 diplotype:.

markwoon commented 1 year ago

We are working on the issue.

We plan on changing the way DPYD is handled, but unfortunately this is not a quick fix. I'm hoping to resolve this by the end of next week.

krukanna commented 12 months ago

Hi, I tested new version 2.7.1 and the error didn't occur. Is this already solved or need to wait to next release anyway?

markwoon commented 11 months ago

Apologies for not responding sooner, I was hoping to give you a better timeline.

2.7.1 does not include anything specific to solve this problem. I'm surprised it made any difference.

The next release, which I've just wrapped up testing for, will have a new DPYD algorithm, so I would recommend you hold off. Currently working on updating the documentation and a final round of testing before release. I'm hoping that it happens by the end of the week.

hudja commented 11 months ago

Hi, thank you for your comment. I tested 2.7.1, and prepared this comment before your update, so maybe you will still find this information useful. Unless my understanding is very wrong, the problem concerns HapB3. Basically, if one of the haplotypes is HapB3 + some other mutation, the program does not return the allele from the other chromosome. My data is phased, and I only show variable positions below (consider all not shown positions as REFs).

In this example, the call, if I understand correctly is [c.1129-5923C>G, c.1236G>A (HapB3) + c.85T>C (*9A)] / Reference with 1.5 activity score, but the program returns c.1129-5923C>G, c.1236G>A (HapB3) only and 0.5 activity score.

chr1:97573863   rs56038477  C|T C   c.1129-5923C>G, c.1236G>A (HapB3) - 0.5
chr1:97579893   rs75017182  G|C G   c.1129-5923C>G, c.1236G>A (HapB3) - 0.5
chr1:97883329   rs1801265   A|G A   c.85T>C (*9A) - 1.0

Here, the call is [c.1129-5923C>G, c.1236G>A (HapB3) + c.85T>C (*9A)] / c.1627A>G (*5) (activity score 1.5), but it returns c.1129-5923C>G, c.1236G>A (HapB3) only (activity score 0.5).

chr1:97515839   rs1801159   C|T T   c.1627A>G (*5) - 1.0
chr1:97573863   rs56038477  C|T C   c.1129-5923C>G, c.1236G>A (HapB3) - 0.5
chr1:97579893   rs75017182  G|C G   c.1129-5923C>G, c.1236G>A (HapB3) - 0.5
chr1:97883329   rs1801265   A|G A   c.85T>C (*9A) - 1.0

It seems to return a normal call if one allele is HapB3 only w/o extra mutations. I looked into the json reports and it seems that sourceDiplotype contains only 1 entry with correct allele1 each, but no allele2 (null), or it can be two entries, with allele1 present and allele2 missing in both. So smth. like this:

sourceDiplotype[0]/allele1/HapB3
sourceDiplotype[0]/allele2/null

or

sourceDiplotype[0]/allele1/HapB3
sourceDiplotype[0]/allele2/null
sourceDiplotype[1]/allele1/c.85T>C
sourceDiplotype[1]/allele2/null

This is how my first [c.1129-5923C>G, c.1236G>A (HapB3) + c.85T>C (*9A)] / Reference example looks like: https://ibb.co/R3dfwvp

Thank you!

markwoon commented 11 months ago

This should be fixed in 2.8.0.

hudja commented 11 months ago

Hello, unfortunately, I now have the following error with 2.8.0:

java.lang.IllegalStateException: STRAND MISMATCH 1
    at org.pharmgkb.pharmcat.haplotype.DpydHapB3Matcher.mergePhasedDiplotypeMatch(DpydHapB3Matcher.java:182)
    at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.callDpyd(NamedAlleleMatcher.java:310)
    at org.pharmgkb.pharmcat.haplotype.NamedAlleleMatcher.call(NamedAlleleMatcher.java:202)
    at org.pharmgkb.pharmcat.Pipeline.call(Pipeline.java:233)
    at org.pharmgkb.pharmcat.PharmCAT.main(PharmCAT.java:166)

My data was preprocessed with the latest (2.8.0) preprocessor. I have multiple such cases, this is just one example (all not-shown positions are REFs):

chr1    97515865    1|0
chr1    97573863    0|1
chr1    97579893    0|1
chr1    97699535    1|0
chr1    97883329    1|1