Closed winni2k closed 3 years ago
I'm currently working on option 2. See tests/test_graph.py for the currently implemented API. I think it should be doable to get it in a state such that most NetworkX algorithms work. For now, I'm focusing on colored compacted De Bruijn graphs only, and ignore the non-color variant.
Things I still need to think about:
Thanks for getting back to me!
My understanding of networkx is that it really wants both succ
and pred
methods to be available. I notice that pred
does not seem to be tested in https://github.com/broadinstitute/pyfrost/blob/master/tests/test_graph.py. Is pred
non-trivial to implement from a bifrost data structure?
Ha no, pred
is working for the most part, but one of my tests was failing, and it seems that my understanding of the data structure is still incomplete (see https://github.com/pmelsted/bifrost/issues/17), so I have some refactoring to do :)
As mentioned, this project is in a very very early stage, and still in active development.
Did you manage to resolve the issue with pred
, @lrvdijk?
I was going to try and apply some networkx algorithms such as all_simple_paths to pyfrost with an eye to creating a tutorial section on graph travelsal. I think I can borrow many tests for graph traversal from cortexpy. Should I write them up and make a PR? Do I understand correctly that you would expect algorithms to work out of the box if both succ
and pred
are implemented?
A side question: Do you think that networkx.subgraph_view
would work with a pyfrost graph?
Yes pred
works, as do many graph functions like neighbors()
, predecessors()
, successors()
etc.
Here in the tests folder I use networkx's breadth first search and depth first search on the graph: https://github.com/broadinstitute/pyfrost/blob/master/tests/test_dfs_bfs.py
I haven't extensively tested other networkx algorithms yet, the coming weeks I will use this library in my own research projects and see what missing features are the most important, and prioritize based on that what I will develop next. But in general, I believe most NetworkX algorithms are written in such a way that is pretty agnostic to the underlying data structure, as long as the algorithms can obtain node metadata and its neighbors.
The builtin subgraph_view
is not going to work with the current pyfrost implementation, as it assigns the private _succ
and _pred
dicts of a normal networkx.DiGraph
to filtered dicts, which doesn't work for pyfrost because it doesn't have dicts as base data structure. I'll have to come up with a custom implementation, which is definitely high on my list, but need to flesh it out a bit more.
If you are using a dict-like structure for an adjacency_list approach to storing your graph data, then check out the "dict_factory" features of NetworkX's Graph and DiGraph classes. You should be able to "just" drop in your backend in place of the NetworkX Python dict-of-dict-of-dict and have all algorithms work.
If your graph structure is not adjacency oriented, then you'll have more work to make a NetworkX interface. :}
Thanks @dschult!
I've now landed a refactor of the code that supports subgraph_view. I emulate the _succ, _pred and _node dicts in C++, but other than that it's just a proper NetworkX DiGraph. See the latest beta release!
Hi! I'm opening this issue to discuss the networkx-compatible API design.
I've done something like this for Cortex and McCortex files over at https://github.com/winni2k/cortexpy, and I am interested in which approach you were hoping to take. So far I see two options: 1. load bifrost data into RAM and turn it into a dict-of-dict-of-dicts to be wrapped by nx.Graph, or 2. write a nx.Graph or nx.DiGraph duck type that loads data from bifrost on disk on demand.
cortexpy
but I never quite got it right. It's unfortunate that there isn't an abstract base class for nx.Graph and its compatriots.