Closed 13bmartens closed 1 year ago
Hi @13bmartens! The node files should have a column assigning to each node the node types, something like:
node_a,node type of node a
node_b,first node type of node b|second node type of node b
Note the possibility of providing multiple node types using the |
in this example.
The node types file should only contain the unique node types, and its primary use is that you can specify the node types numerically in the nodes CSV, so in the aforementioned example you could write:
node_a,0
node_b,1|2
and the associated node types file would look like this:
node type of node a
first node type of node b
second node type of node b
Using the numeric node type IDs and the associated node types file is preferable but not necessary, and mostly makes the CSVs smaller, and more compressible (so you can move the data a bit more easily) and the loading time is much faster as we can make more assumptions about the data being loaded. Moreover, if the file is smaller, fewer data need to be read, and so the IO bottleneck is reduced.
I hope this answers your question.
I have added a check to the CSV reader raising an error when the node type (and edge type) file path is provided and no other parameter binding node types to the node list is provided. In the future version, upon parametrizing in this incomplete way the loader, you will receive an explanation on how to correct the parametrization.
Thank you for the quick reply @LucaCappelletti94!
I got the example working using your input:
pl.DataFrame({
'source': ['A', 'A', 'A', 'A', 'A', 'F', 'F', 'F', 'A', 'F'],
'destination': ['B', 'C', 'D', 'E' , 'F', 'G', 'H', 'I', 'J', 'J'],
}
).write_csv('edges.csv')
pl.DataFrame({
'node_type': ['link', 'sat'],
}
).write_csv('node_types.csv')
pl.DataFrame({
'node_name': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
'node_type': ['link', 'sat', 'sat', 'sat' , 'sat', 'link', 'sat', 'sat', 'sat', 'sat'],
}
).write_csv('node_names.csv')
graph = Graph.from_csv(
#Edges
edge_path="edges.csv",
sources_column="source",
destinations_column="destination",
edge_list_header=True,
#Nodes
node_path = "node_names.csv",
nodes_column = "node_name",
node_list_header = True,
node_list_node_types_column = "node_type",
#Node Types
node_type_path = "node_types.csv",
node_types_column = "node_type",
node_type_list_header = True,
directed = False
)
Happy to hear that! I will be closing the issue then. Feel free to re-open if you encounter again related problems. I am also available on GRAPE discord server
Hi Team, thank you for the awesome library!
I am trying to import a very basic dataset for a POC and struggling with the _fromcsv method.
I want to construct a graph using my own data:
Resulting in three csv files with a header each.
I am then constructing the graph using the following snippet:
When I run
graph.get_node_type_names()
I get the error:Anything I am doing wrong?
Thanks for the time!