Open JackAtOmenApps opened 2 years ago
The graphs produced with django-postgresql-dag are likely to be non-isomorphic.
What is meant by this sentence? isomorphism AFAIK is a property of a pair of graphs ( graph X is isomorphic to graph Y ), and not a stand alone property of a graph.
In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's
connected_graph()
method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute †, this should allow guaranteed identification of changes to the graph.It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:
connected_graph()
query) to retrieve all nodes in the current graph.id
values of that node.Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.
Alternative approaches considered: NetworkX Provides a
weisfeiler_lehman_graph_hash()
function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.† Potentially expensive in time and memory. The tuple we use to calculate the hash contains the
id
of every node in the graph, and for each node, theid
s of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.