Closed TypingCat closed 3 years ago
I tested python graph modules. Hash shows good performance.
Commit 5510eb1c7c2fbe8874d6fe938e6fa2accfc3e8e2 add a test code for the I/O operation and similarity calculation speed. Module networkx
and neo4j
are imported. Hash dict
is added as a reference. Other modules graphlib
, graph-theory
, graph-tool
are not tested, because they do not seem to fit the purpose of this project. The following is the results.
time_hash: mean 0.00000, max 0.00003
time_hash[30:]: mean 0.00000, max 0.00003
time_networkx: mean 0.00002, max 0.00013
time_networkx[30:]: mean 0.00002, max 0.00013
time_neo4j: mean 0.00590, max 0.01549
time_neo4j[30:]: mean 0.00604, max 0.01549
time_hash_full: mean 0.00002, max 0.00007
time_hash_full[30:]: mean 0.00002, max 0.00007
time_networkx_full: mean 1.24723, max 165.99804
time_networkx_full[30:]: mean 0.92807, max 2.81889
The subgraphs (1, 1), (1, 2), (1, 3) show the I/O operation time. For each step, a new local graph which contains 10 nodes and 10 edges is created, and merged on a global graph. Because 8 nodes are redundant, only two nodes are newly added to the global graph. In terms of input/output speed, dict
is 10 times faster than networkx
and 3000 times faster than neo4j
: neo4j
retired.
The subgraph (2, 1), (2, 2) show the similarity calculation time between local graph and global graph. networkx
can calculates approximations of graph edit distance for similarity. dict
doesn't support similarity calculation, so I create a simple edit distance calculation function. networkx
similarity function uses multi-threads, and it causes overflow after 500 steps: networkx
retired.
These results show that python graph modules are not suitable for this project. I'd better to create a graph structure using dict
.
There are several ways to implement nodes of graph. I can use hash or existing python modules. Hash is fast, and existing python modules have various functions. Searching and testing are needed to make a decision.