biolink / biolink-model-toolkit

A collection of useful python functions for looking up information and working with the Biolink Model
https://biolink.github.io/biolink-model-toolkit/
BSD 3-Clause "New" or "Revised" License
20 stars 11 forks source link

Problems retrieving inverse predicates #145

Open colleenXu opened 1 year ago

colleenXu commented 1 year ago

I can't tell if this is an issue with the biolink-model or the toolkit, but I'm having issues retrieving inverse predicates for non-symmetrical predicates.

Previously (biolink-model 3.1.1, and maybe bmt 0.9.0), I could use statements like bmt_tool.get_element('causes').inverse to retrieve the inverse predicate if it existed (and it would return nothing or None if the inverse didn't exist). It looked like it took the input predicate with underscores, not spaces.

However, now that kind of statement always returns nothing / None.


First, I tried looking at the documentation, and I saw the method has_inverse. However, this function always seems to return False, even when I would expect it to return True. For example, bmt_tool.has_inverse('causes') evaluates as False

Second, I searched in the repo, and I saw the method get_inverse.

It seems to return the inverse predicate if it exists, and return None if there's no inverse because the input predicate was symmetrical.

However, I've noticed:

colleenXu commented 1 year ago

I tested with 1.1.0 and with 1.1.1 which just came out

sierra-moxon commented 1 year ago

@colleenXu - you are correct, there are two methods. has_inverse is looking for places where the slot has the metadata tag "inverse" listed. In Biolink, we only add the inverse metadata tag to predicates that are not the canonical predicate direction. get_element('causes').inverse is equivalent to this method. I can imagine a name change here for the method will help.

e.g. causes is canonical, so it does not have the inverse: metadata tag. caused_by is the inverse, so it does have the inverse: metadata tag --- inverse: causes

consequently, these test cases all pass:

    assert toolkit.has_inverse("completed by") == True
    assert toolkit.has_inverse('causes') == False
    assert toolkit.has_inverse('caused by') == True
    assert not toolkit.has_inverse("this_does_not_exist")

the get_inverse method is there for convenience so that you can enter any predicate (regardless of its symmetric or canonical status), and get the inverse.

consequently, all these tests pass (note the upper case here is just python variable that is reused throughout the tests instead of a string version of the predicate name)

    assert toolkit.get_inverse(ACTIVE_IN) == HAS_ACTIVE_COMPONENT
    assert toolkit.get_inverse(HAS_ACTIVE_COMPONENT) == ACTIVE_IN
    sd = toolkit.get_element(ACTIVE_IN)
    assert toolkit.get_inverse(sd.name) == HAS_ACTIVE_COMPONENT
    assert toolkit.get_inverse('causes') == 'caused by'
    assert toolkit.get_inverse('subclass of') is None

These methods are functioning as expected, but I can see where some tweaks to the name/docstrings will help!

Per your other questions/statements:

subclass of does not currently have an inverse in Biolink Model. There has been much discussion of what the inverse of this predicate will be and it has gone back and forth a few times. see: https://github.com/biolink/biolink-model/pull/396 and https://github.com/biolink/biolink-model/pull/1179, etc. We do not have a conclusion yet - your input welcome.

quantifier qualifier isn't technically a predicate in the model. It's an edge property and I don't have the history as to why SEMMED:MEASURES is mapped here, but I would hypothesize that we are collapsing this SEMMEDDB edge into a property + value on another edge.

I'd be happy to add as a feature, the ability to query slot names using underscores.