VHRanger / CSRGraph

A tiny library for large graphs
MIT License
111 stars 17 forks source link

Output which edge was passed through in a random walk #16

Closed Filco306 closed 2 years ago

Filco306 commented 2 years ago

Hello,

I am using your package (which I really like by the way) to perform some random walks. I am performing walks on an networkx.MultiDiGraph, and my edges have specific attributes (integer attributes). Is there a way to output either the edge ids or these attributes from the random walk?

Thank you in advance!

Best Filip

VHRanger commented 2 years ago

So the edge IDs are actually what's returned here:

https://github.com/VHRanger/CSRGraph/blob/master/csrgraph/graph.py#L252

You might mean that you want the edge names rather?

The csrgraph object is largely a CSR matrix (edge ID, edge pointer, weights) plus a node name array (co-indexed to edge IDs, which are just an internal index, really) the node name array and associated logic is what lets us be compatible to networkX or other more intuitive graph representations.

Filco306 commented 2 years ago

Sorry, isn't the walks an np array returning the nodes of a networkx graph it has passed through? I thought they were the ids of the nodes and not the edges?

Filco306 commented 2 years ago

Ah, I think I understood your answer. So then I would most likely be able to change parts of the source code and fix it myself. Thank you!

Filco306 commented 2 years ago

Actually, let me take that back :p still not entirely certain what you mean

VHRanger commented 2 years ago

If you change parts of the source code, please make it into a pull request.

The problem is that this package is designed to be scalable, so mapping the node back to the node "names" (which might just be different integers in NetworkX - NetworkX has no ordering guarantee over the nodes) can be slow for massive random walk jobs.

You can see how I did it in the nodevectors here:

https://github.com/VHRanger/nodevectors/blob/master/nodevectors/node2vec.py#L115

Which is pretty fast, but far from optimal.

I tried making a mapper nodeID -> node Name for CSRGraphs directly in Numba, but Numba has a hard time supporting mixed type dictionaries (names can be anything coming in since they come from NetworkX where they're arbitrary python objects)

Filco306 commented 2 years ago

Interesting. I think that problem can be easily overcome through simply using the indices and discarding the names, would it not? Alternatively, one could enforce integer "names", i.e., ids. I'll look into it when I have some spare time on my hands :)

VHRanger commented 2 years ago

So we have to maintain the names for compatibility with other libraries (networkX, etc.) as well as helping the graph have a dict-like interface from the outside. For non-devs on this package it might not be super intuitive, however.

I'll close the issue for now, because I don't see an action point, and I'm cleaning up the backlog.

Please open another issue if this persists!