Closed fawaz-dabbaghieh closed 2 years ago
Difficult to tell, it depend on the graph and on the system, but indeed the graph is relatively long and currently Gfapy is entirely written in Python, so it has its limits...
Maybe you could try to set vlevel=0
in the from_file
call? This disables validations, but should then be faster.
I see! Thank you for the very quick response!
Alternatively, you could consider using my library (not yet published, but publicly available) textformats
which also has a Python interface and a GFA1 specification (file https://github.com/ggonnella/textformats/spec/gfa/gfa1.yaml). It is written in Nim and is much faster for large files.
However, it does not offer all operations on the graph, that gfapy offers, since it is generical.
I was trying to load a GFA1 from a file with gfapy but I had to kill the process because it is taking over 15 minutes and not finishing. I am not sure what could be wrong.
The GFA is a de Bruijn graph and is the output of convertToGFA.py , where this script converts the contigs from bcalm2 output to a valid GFA1 file. This graph has 944785 nodes and 2419232 edges.
Minimal example here:
Is it supposed to take this long?