Closed caufieldjh closed 2 years ago
@caufieldjh thanks! you have solved this problem - apparently we just need to use this node_list_node_types_column
instead of node_types_column
. This must have changed recently in Ensmallen.
To show that you've solved the issue:
g = Graph.from_csv(
directed=False,
node_path='tests/resources/test_graphs/pos_train_nodes.tsv',
edge_path='tests/resources/test_graphs/pos_train_edges.tsv',
verbose=True,
nodes_column='id',
node_list_node_types_column='category',
default_node_type='biolink:NamedThing',
sources_column='subject',
destinations_column='object',
default_edge_type='biolink:related_to'
)
g
[snip]
Node types
The graph has 2 node types, which are [biolink:Protein](https://biolink.github.io/biolink-model/docs/Protein.html) (19354 nodes, 52.08%) and [biolink:Gene](https://biolink.github.io/biolink-model/docs/Gene.html) (17809 nodes, 47.92%).
[snip]
In the config YAMLs, when we want to load node types from a nodelist, the Ensmallen graph loader expects to see
node_list_node_types_column
. We currently usenode_types_column
- Ensmallen can certainly take this parameter, but it thinks it means "The name of the column of the node types file from where to load the node types." - emphasis mine. We are planning to create the node types file as needed, so the YAMLs should usenode_list_node_types_column
to specify the column where nodes are assigned categories.