HazyResearch / dd-genomics

The Genomics DeepDive project
Apache License 2.0
11 stars 6 forks source link

Charite Pheno Recall Analysis #290

Closed Colossus closed 8 years ago

Colossus commented 8 years ago

Not too sure if this is the best place, but I'm going to put my Charite pheno recall analysis here. I.e. why are we extracting far fewer phenos than Charite?

Colossus commented 8 years ago

This is a query to get all Charite genephenos where we don't ever pick up the pheno:

select distinct 
  hpo_id, 
  canonical_name 
from 
  charite c 
  join genes g 
    on (g.ensembl_id = c.ensembl_id) 
  left join pheno_mentions p 
    on (c.hpo_id = p.entity) 
where p.doc_id is null;
Colossus commented 8 years ago

So many Charite phenos are not "allowed" phenotypic abnormalities (abnormality; no cancer):

select count(distinct hpo_id) from charite;
6074

select count(distinct hpo_id) from charite where hpo_id not in (select distinct hpo_id from allowed_phenos);
346

So almost no "disallowed phenos"

Colossus commented 8 years ago

We should synonym "Abnormality of skin physiology" and all "abnormalities of" to "abnormal blah" automatically.

"Unossified sacrum" ... hard to find. Have only 11 sentences in whole database with word "unossified".

Colossus commented 8 years ago

Why don't we find anything with "Thymoma" (HP:0100522)?? This should be an easy one

EDIT: it's a cancer

Colossus commented 8 years ago

Maybe just dump all "abnormality of" and "abnormal" prefixes such as in "abnormal eye physiology"

Colossus commented 8 years ago

neoplasm; cancer; tumor; should all be tumors. insert synonyms manually

Colossus commented 8 years ago

Why don't we pick up "stillbirth" HP:0003826??

EDIT: It's not a phenotypic abnormality ... we're getting a little morbid here

Colossus commented 8 years ago

Chop off "morphology" and "physiology" suffixes such as in "Abnormal trabecular bone morphology" or "abnormal eye physiology" ... unless all that's remaining is a simple english word such as eye

Colossus commented 8 years ago

Should we leave "sarcoma" phenotypes in? and allow them? like HP:0200058 angiosarcoma?

EDIT: forget about them, it's cancer

Colossus commented 8 years ago

split phenos containing a slash and create synonyms; won't work perfectly but better

Colossus commented 8 years ago

split only slashed word

Colossus commented 8 years ago

If dropping "abnormality of (the)"/"abnormal" leaves only one word, don't add single word; add "physiology", "morphology", "dysplasia", "hypoplasia", "aplasia" to single word and add that

Colossus commented 8 years ago

replace all "physiology" by "morphology"

Colossus commented 8 years ago

when dropping abnormality, in general add "physiology", "morphology", "dysplasia", "hypoplasia", "aplasia"