Knowledge-Graph-Hub / neat-ml

Network Embedding All the Things
BSD 3-Clause "New" or "Revised" License
18 stars 1 forks source link

Example YAMLs should use different value for loading node types #54

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

In the config YAMLs, when we want to load node types from a nodelist, the Ensmallen graph loader expects to see node_list_node_types_column. We currently use node_types_column - Ensmallen can certainly take this parameter, but it thinks it means "The name of the column of the node types file from where to load the node types." - emphasis mine. We are planning to create the node types file as needed, so the YAMLs should use node_list_node_types_column to specify the column where nodes are assigned categories.

justaddcoffee commented 2 years ago

@caufieldjh thanks! you have solved this problem - apparently we just need to use this node_list_node_types_column instead of node_types_column. This must have changed recently in Ensmallen.

To show that you've solved the issue:

g = Graph.from_csv(
    directed=False,
    node_path='tests/resources/test_graphs/pos_train_nodes.tsv',
    edge_path='tests/resources/test_graphs/pos_train_edges.tsv',
    verbose=True,
    nodes_column='id',
    node_list_node_types_column='category',
    default_node_type='biolink:NamedThing',
    sources_column='subject',
    destinations_column='object',
    default_edge_type='biolink:related_to'
)
g
[snip]
Node types
The graph has 2 node types, which are [biolink:Protein](https://biolink.github.io/biolink-model/docs/Protein.html) (19354 nodes, 52.08%) and [biolink:Gene](https://biolink.github.io/biolink-model/docs/Gene.html) (17809 nodes, 47.92%).
[snip]