Open cbizon opened 6 years ago
The approach I was envisioning is difficult when the subclasses do not span the superclass, which is maybe what's happening with biological_process > pathway. For the ontology:
rectangle
square
circle
we may want to query for rectangles or specifically for squares. I still think that the X_or_Y types are unnecessary, but neither can we unambiguously assign a single type to each node.
Currently we allow paths to use the union supertypes. So if we want a spot in the query to be either a disease or a phenotypic_feature, we use the type disease_or_phenotypic_feature. This will call all the right stuff, and you'll end up with some nodes that are disease and some that are phenotypic_features but they will all have type in the graph of disease_or_phenotypic_feature.
In the case of a cached graph, this is kind of bad, because if you then ask for a subclass, you won't find it (unless your query knows about the biolink-model). We would prefer that the final knowledge graph contains the pushed down node type (disease or phenotypic_feature). This is entirely reasonable and doable.
There is also an idea that the query should handle this by containing a list of types. So instead of saying "disease_or_phenotypic_feature" you would specify that a node could be one of ("disease", "phenotypic_feature"). The thought is that this makes constructing queries easier? I guess I'm not convinced that this is the simplest approach. If "disease_or_phenotypic_feature" had a shorter name, like "yellow", you'd just ask for a "yellow" node.
The other thing to consider is whether we expect biolink-model to get more complicated. The deepest case at the moment is (indention indicates subclass)
So in this case, if you want all of these (which I suggest is the most common case) you either put the top node or a list of 3 things (which you have to know you want).
@kennethmorton @patrickkwang opnions?