NCATS-Gamma / robokop

Master UI for ROBOKOP
MIT License
16 stars 3 forks source link

How to handle supertypes? #109

Open cbizon opened 6 years ago

cbizon commented 6 years ago

Currently we allow paths to use the union supertypes. So if we want a spot in the query to be either a disease or a phenotypic_feature, we use the type disease_or_phenotypic_feature. This will call all the right stuff, and you'll end up with some nodes that are disease and some that are phenotypic_features but they will all have type in the graph of disease_or_phenotypic_feature.

In the case of a cached graph, this is kind of bad, because if you then ask for a subclass, you won't find it (unless your query knows about the biolink-model). We would prefer that the final knowledge graph contains the pushed down node type (disease or phenotypic_feature). This is entirely reasonable and doable.

There is also an idea that the query should handle this by containing a list of types. So instead of saying "disease_or_phenotypic_feature" you would specify that a node could be one of ("disease", "phenotypic_feature"). The thought is that this makes constructing queries easier? I guess I'm not convinced that this is the simplest approach. If "disease_or_phenotypic_feature" had a shorter name, like "yellow", you'd just ask for a "yellow" node.

The other thing to consider is whether we expect biolink-model to get more complicated. The deepest case at the moment is (indention indicates subclass)

biological_process_or_molecular_activity
  biological_process
    pathway
  molecular_activity

So in this case, if you want all of these (which I suggest is the most common case) you either put the top node or a list of 3 things (which you have to know you want).

@kennethmorton @patrickkwang opnions?

patrickkwang commented 6 years ago

The approach I was envisioning is difficult when the subclasses do not span the superclass, which is maybe what's happening with biological_process > pathway. For the ontology:

rectangle
  square
circle

we may want to query for rectangles or specifically for squares. I still think that the X_or_Y types are unnecessary, but neither can we unambiguously assign a single type to each node.