Open whaleyr opened 2 years ago
Update on batch mode discussion (Meeting notes from Biobank data analyses call 12/13/21). The discussion was about the matcher and phenotype output
CSV/TSV output format combining the results from all the separate PharmCAT runs
Proposed format: Index by sample with the different genotypes by gene in one file and the phenotypes in a second file; additional log file to include warnings (need further discussion: output after each matcher and phenotyper possible)
Could be provided as additional template scripts on PharmCAT GitHub ( - GitHub wiki to document uses of template scripts; - We should not be responsible for maintaining those scripts)
After internal discussion we decided to close this issue about TSV output from the reporter.
The data that comes out of the reporter is quite large and complicated. Showing only the small portion that appears in the first table of the report glosses over a lot of the complexity and documentation that people should know when interpreting the results. We feel it would be a disservice to the user to have an option that discards all that information.
I am reopening this issue after the discussion of reporting a TSV to assist large-scale data analysis. It is not to generate a TSV across all samples of interest as we previously discussed, but to focus on extracting PGx inferences of a single sample.
The purpose is to help calculate PGx frequencies. I think there should be a warning that this TSV output should not be used as a substitute of the report for interpreting a person's PGx testing results or prescribing recommendations.
There should be different tables for calculating different frequencies (genotypes vs phenotypes). And I think we can use base file name for the Sample ID below instead. In addition, the information of present and missing variation in VCF is not listed here because it is helpful for quality check but not so much for PGx frequency estimation.
For genotype frequencies, I am thinking about the following content: | Sample ID | Diplotype Index | Diplotype | Haplotype Index | Haplotype | Function | Warning |
---|---|---|---|---|---|---|---|
S1 | Diplotype 1 | 2/3 | Haplotype 1 | *2 | Poor Function | Multiple Diplotypes | |
S1 | Diplotype 1 | 2/3 | Haplotype 2 | *3 | Poor Function | Multiple Diplotypes | |
S1 | Diplotype 2 | 4/5 | Haplotype 1 | *4 | Normal Function | Multiple Diplotypes | |
S1 | Diplotype 2 | 4/5 | Haplotype 2 | *5 | No Function | Multiple Diplotypes | |
S1 | Diplotype 3 | 6/7 | Haplotype 1 | *6 | Normal Function | Multiple Diplotypes | |
S1 | Diplotype 3 | 6/7 | Haplotype 2 | *7 | No Function | Multiple Diplotypes |
Note
For phenotype frequencies, I am thinking about the following content: | Sample ID | Phenotype Index | Phenotype | Diplotype Index | Diplotype | Function | Warning |
---|---|---|---|---|---|---|---|
S1 | Phenotype 1 | Poor Metabolizer | Diplotype 1 | 2/3 | Poor Function/Poor Function | Discrepant Phenotypes | |
S1 | Phenotype 2 | Intermediate Metabolizer | Diplotype 1 | 4/5 | Normal Function/No Function | Discrepant Phenotypes | |
S1 | Phenotype 2 | Intermediate Metabolizer | Diplotype 2 | 6/7 | Normal Function/No Function | Discrepant Phenotypes |
Note:
Add TSV output as an option for Phenotyper and Reporter (perhaps NamedAlleleMatcher too?).
This was brought up in group discussion and issue #85