What is the current behavior?
Currently, PharmCAT will produce a CYP2C9 9/11 call when run on an unphased VCF for this sample. However, when provided with a phased VCF containing multiple disjoint phase blocks (PS tag), the tool appears to incorrectly interpret them as being phased relative to each other and produces an "Unknown/Unknown" call.
If the current behavior is a bug, please provide the steps to reproduce and, if possible, your example input data via a Gist or similar.
We used this command template with v2.7.1 at the initial time of testing:
Example phased VCF subset that produces the behavior:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample_name
chr10 94949217 . A G 73.6 PASS . GT:GQ:DP:AD:VAF:PL:PS 0|1:69:113:51,62:0.548673:73,0,71:94949217
chr10 94981224 . C T 70.5 PASS . GT:GQ:DP:AD:VAF:PL:PS 0|1:65:193:95,98:0.507772:70,0,66:94972974
What is the expected behavior?
The expected behavior would be to recognize that these variants are not actually phased with each other (Note the PS tags are different) and then discover the 9/11 call as was made with the unphased VCF.
What is the motivation / use case for changing the behavior?
It should reduce errors when phased datasets are provided, especially in longer genes where end-to-end phasing is less likely. In this instance it produced an Unknown/Unknown result, but there are likely cases where PharmCAT is asserting an incorrect or incomplete result (e.g., it should be ambiguous, but it's reporting just one).
Please tell us about your environment:
PharmCAT Version: v2.7.1 at the time of the initial test
JDK Version: Docker
Environment: Docker
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)
@whaleyr - It's possible some of the above information is out-of-date or has been fixed since I haven't run the full end-to-end pipeline recently. Let me know if you want me to try reproducing with a more recent version (not sure when I'll get to it, though).
Do you want to request a feature or report a bug? Report a bug as requested by @whaleyr in https://github.com/PacificBiosciences/pb-StarPhase/issues/6.
What is the current behavior? Currently, PharmCAT will produce a CYP2C9 9/11 call when run on an unphased VCF for this sample. However, when provided with a phased VCF containing multiple disjoint phase blocks (PS tag), the tool appears to incorrectly interpret them as being phased relative to each other and produces an "Unknown/Unknown" call.
If the current behavior is a bug, please provide the steps to reproduce and, if possible, your example input data via a Gist or similar.
We used this command template with v2.7.1 at the initial time of testing:
Example phased VCF subset that produces the behavior:
What is the expected behavior? The expected behavior would be to recognize that these variants are not actually phased with each other (Note the PS tags are different) and then discover the 9/11 call as was made with the unphased VCF.
What is the motivation / use case for changing the behavior? It should reduce errors when phased datasets are provided, especially in longer genes where end-to-end phasing is less likely. In this instance it produced an Unknown/Unknown result, but there are likely cases where PharmCAT is asserting an incorrect or incomplete result (e.g., it should be ambiguous, but it's reporting just one).
Please tell us about your environment:
Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)
@whaleyr - It's possible some of the above information is out-of-date or has been fixed since I haven't run the full end-to-end pipeline recently. Let me know if you want me to try reproducing with a more recent version (not sure when I'll get to it, though).