Closed lauragails closed 1 year ago
Question to check my understanding RE phrank score: When reading the documentation, I assumed there would be 1 phrank score per phenotype studied, per gene, per individual. And this phrank score could be used to jumpstart clinical diagnoses.
The Phrank method generates a score for each gene to describe how well variants in that gene could explain a set of phenotypes. Phrank scores are not normalized, and the maximum phrank score increases with the number of phenotypes provided. Think of it as a way to rank sort candidates rather than a threshold.
For our implementation:
If I ran every individual proband in its own cohort, would I get different phrank scores for each gene (and preferably for tested phenotypes)?
Every unrelated proband should be in its own cohort with its own HPO terms. Think of cohorts as families.
Troubleshooting: This is the logifle for calculate_phrank, and I wanted to make sure nothing was impacted in the analysis
YAMLLoadWarning
is a warning that a function call we used is deprecated. No error has occurred.
Got it. So in cohort.yaml
I put "HP:0000717" for the whole cohort, because the whole cohort does have autism (but almost certainly with differing underlying genetic contributions). I will treat each cohort as a family going forward.
RE algorithm: Approximately can I assume: The set of HPO terms would be approximated by looking at the terms connected N degrees from this phenotype in: resources/hpo/hpoDag.txt
Genes mapped to each HPO term are in: resources/hpo/ensembl.hpoPhenotype.tsv
Thank you again!
I know our use of cohort is confusing, and we'll try to make it clearer going forward in future workflows. Maybe "case" or "family" would have been a better choice of words?
Hello,
ie title here is: "Calculate the "Phrank" phenotype match score for a list of phenotypes for every gene."
However there is 1 phrank score per gene in
COHORT_phrank.tsv
. What phenotype(s) are used to calculate the phrank score? Should this score be used to prioritize a gene in general? What is a good/high score?This is my distribution of phrank scores across genes:
Corollary: When I look into
svpack/COHORT.GRCh38.pbsv.svpack.tsv
I see a table with the following columns#mode family_id sample_id chr:pos:ref:alt genotype(sample,dad,mom) SVTYPE SVLEN SVANN CIPOS MATEID END gene highest_impact depths(sample,dad,mom) allele_balance(sample,dad,mom) gene_impact_transcript lof clinvar phrank
The phrank scores seem to match the gene-level scores in the previous file.
If I ran every individual proband in its own cohort, would I get different phrank scores for each gene (and preferably for tested phenotypes)?
/REDACTED/workflow/scripts/calculate_phrank.py:212: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. cohort_list = yaml.load("".join(yamlfile))