Closed damiansm closed 3 years ago
I think LR2PG will do better in this situation. Still polishing the code.
A related problem observed for a real example. Case with a homozygous variant in CASP8 correctly hit the AR disease OMIM:607271 as the best phenotype match. However the score was halved as the OMIM prioritiser only knew about AD forms of disease for this gene as the DiseaseDao query has a clause of disease.TYPE in ('D', 'C') but no clause is not used by the hiPhive prioritiser (ModelServiceImpl) and OMIM:607271 has type=? in the table.
Intuitively
Plan to change hiPhive behaviour
Don't run OMIM prioritiser with hiPhive anymore - remove from yml but maybe add some extra checks. For other prioritisers such as Phenix it should still be run
Try to populate inheritance for all orphanet disease-gene associations in the disease table but tricky as only seems to be recorded at disease level for orphanet
Adjust hiPhive SQL to only select association_type = C,D or ? possibly. Currently takes all. Make sure in sync with OMIMPrioritiser logic.
Orphanet MOI annotations are going to be added to the phenotype_annotation.tab
come mid July https://github.com/monarch-initiative/phenol/issues/208
@julesjacobsen Think we have a RC ready to test for this now?
@julesjacobsen Just checked this again for latest 13.0.0-SNAPSHOT and all looks good. Above CFTR example is resolved properly
Great - closing now.
Currently we identify the best phenotype match for a gene for a particular mode of inheritance (MOI) using the phenotype prioritiser and then apply the OMIM prioritiser to penalise incorrect MOI matches to human disease.
However, the OMIM prioritiser considers ALL diseases linked to the gene rather than just the one providing the phenotype evidence so you can end up with a scenario where there is a great phenotype match to an AR disease for a rare, heterozygous, known pathogenic (clinvar) variant (known to be pathogenic as hom or compound-het for that same AR disease) that scores highly under the AD MOI and where the phenotype score is not halved as AD disease is also associated with that gene.
e.g. CFTR gene is disease table has
OMIM:277180|OMIM:602421|Congenital bilateral absence of vas deferens|1080|D|R OMIM:219700|OMIM:602421|Cystic fibrosis|1080|D|R OMIM:211400|OMIM:602421|Bronchiectasis with or without elevated sweat chloride 1, modifier of|1080|S|D OMIM:167800|OMIM:602421|Pancreatitis, hereditary|1080|S|D ORPHA:60033||Idiopathic bronchiectasis|1080|D| ORPHA:586||Cystic fibrosis|1080|D| ORPHA:48||Congenital bilateral absence of vas deferens|1080|D| ORPHA:399805||Male infertility with azoospermia or oligozoospermia due to single gene mutation|1080|S| ORPHA:676||Hereditary chronic pancreatitis|1080|D|
Quite often get matches to OMIM:219700 based on a heterozygous (AD), chr7:g.117559590ATCT>A known pathogenic variant in clinvar: https://www.ncbi.nlm.nih.gov/clinvar/?term=22144%5Balleleid%5D that is in 1-10% MAF range in diff populations. OMIM prioritiser does not downrank this hit to an AR disease as AD diseases are associated with CFTR in the table
In an ideal world when we are running the phenotype prioritiser we would adjust the score by whether it fits the MOI for that particular disease and then pick the best match. This would remove a lot of false positives I am seeing in the 100KGP set based on clinvar pathogenic variants