Knowledge-Graph-Hub / neat-ml

Network Embedding All the Things
BSD 3-Clause "New" or "Revised" License
18 stars 1 forks source link

In link prediction, filter nodes by prefix or other slots #68

Open caufieldjh opened 2 years ago

caufieldjh commented 2 years ago

Some graphs have nodes we would like to filter for, but they don't make clear distinctions in their Biolink categories:

PR:000002977    biolink:NamedThing                              Graph                                             owl:Class

So we would like to specify a filter for prefix rather than category. This can be based on a flag used in the link_node_types: block in the config.

Similarly, it would be nice to be able to filter by other node slots/properties:

XPO:0134172     biolink:NamedThing      increased apoptosis in simple columnar epithelium       An increased occurrence of apoptotic process in simple columnar epithelium.                Graph

This could be as simple as a regex for a string value in a named column, e.g., match everything with the string "apoptosis"

caufieldjh commented 2 years ago

@LucaCappelletti94 you may have already solved this problem in terms of filtering graph nodelists by CURIE prefix and mapping it to a namespace

LucaCappelletti94 commented 2 years ago

In ensmallen it is possible to filter by the prefix, but I do not know what you mean by mapping it to a namespace.

caufieldjh commented 2 years ago

Same thing as far as we're concerned - namespace == prefix , at least as far as node IDs go.

LucaCappelletti94 commented 2 years ago

Ok, then graph.filter_from_names(...) has all of the kwargs you may desire for this sort of goal. It should be available in the latest nightly build if I am not mistaken (0.7.0.dev20).