PharmGKB / PharmCAT

The Pharmacogenomic Clinical Annotation Tool
Mozilla Public License 2.0
120 stars 39 forks source link

Add option for TSV output #86

Open whaleyr opened 2 years ago

whaleyr commented 2 years ago

Add TSV output as an option for Phenotyper and Reporter (perhaps NamedAlleleMatcher too?).

This was brought up in group discussion and issue #85

katrinsangkuhl commented 2 years ago

Update on batch mode discussion (Meeting notes from Biobank data analyses call 12/13/21). The discussion was about the matcher and phenotype output

whaleyr commented 2 years ago

After internal discussion we decided to close this issue about TSV output from the reporter.

The data that comes out of the reporter is quite large and complicated. Showing only the small portion that appears in the first table of the report glosses over a lot of the complexity and documentation that people should know when interpreting the results. We feel it would be a disservice to the user to have an option that discards all that information.

BinglanLi commented 2 years ago

I am reopening this issue after the discussion of reporting a TSV to assist large-scale data analysis. It is not to generate a TSV across all samples of interest as we previously discussed, but to focus on extracting PGx inferences of a single sample.

The purpose is to help calculate PGx frequencies. I think there should be a warning that this TSV output should not be used as a substitute of the report for interpreting a person's PGx testing results or prescribing recommendations.

There should be different tables for calculating different frequencies (genotypes vs phenotypes). And I think we can use base file name for the Sample ID below instead. In addition, the information of present and missing variation in VCF is not listed here because it is helpful for quality check but not so much for PGx frequency estimation.

For genotype frequencies, I am thinking about the following content: Sample ID Diplotype Index Diplotype Haplotype Index Haplotype Function Warning
S1 Diplotype 1 2/3 Haplotype 1 *2 Poor Function Multiple Diplotypes
S1 Diplotype 1 2/3 Haplotype 2 *3 Poor Function Multiple Diplotypes
S1 Diplotype 2 4/5 Haplotype 1 *4 Normal Function Multiple Diplotypes
S1 Diplotype 2 4/5 Haplotype 2 *5 No Function Multiple Diplotypes
S1 Diplotype 3 6/7 Haplotype 1 *6 Normal Function Multiple Diplotypes
S1 Diplotype 3 6/7 Haplotype 2 *7 No Function Multiple Diplotypes

Note

For phenotype frequencies, I am thinking about the following content: Sample ID Phenotype Index Phenotype Diplotype Index Diplotype Function Warning
S1 Phenotype 1 Poor Metabolizer Diplotype 1 2/3 Poor Function/Poor Function Discrepant Phenotypes
S1 Phenotype 2 Intermediate Metabolizer Diplotype 1 4/5 Normal Function/No Function Discrepant Phenotypes
S1 Phenotype 2 Intermediate Metabolizer Diplotype 2 6/7 Normal Function/No Function Discrepant Phenotypes

Note: