TheJacksonLaboratory / MIMIC_HPO

0 stars 0 forks source link

HPO inference #40

Open kingmanzhang opened 4 years ago

kingmanzhang commented 4 years ago

Inferring HPO helps aggregate samples, but unnecessary inference is not helpful and also makes the analysis noisy, such as "Phenotypic abnormality", or "ALL". How to determine the level of aggregation? Here is a proposal:

Given a set S for all terms that are observed in an experiment. Starting from the root, perform breadth first search--if a node has at least two children that can be directly or indirectly observed, then it should be inferred; exclude otherwise.

Picture1

Cyan nodes are observed; orange nodes should be inferred.

kingmanzhang commented 4 years ago

A node is indirectly observed if at least one of its children is directly observed. Observation could be from direct mapping in lab tests or text mining.

@pnrobinson thoughts?

pnrobinson commented 4 years ago

I would guess that the correct way to choose the nodes will depend on the individual test, i.e., there is not going to be a one way fits all method. Possibly after we have the raw data we can decide which nodes to test using some kind of graph traversal. For instance, given that node i is flagged as interesting, how do the numbers change if I go up to a parent or look at the children of i?