Open SSoelvsten opened 1 year ago
To quickly have the feature (but with fewer/no mathematical guarantees) we can also quicky do one of the following:
Currently, we already provide quite a few bits of meta-information, that we can use to derive a hash-value in constant time without breaking equality checking (see its early-termination cases).
We can also extend the node_writer
with updating an accumulated hash of all pushed nodes (as was done by Anders B. Clausen and Kent Nielsen for the BSc Project).
To use BDDs, resp. ZDDs, in any hash-related context (e.g.
std::unordered_map
andstd::unordered_set
), one needs to derive a hash value for each BDD, resp. ZDD, care about the hash value at the root.Notice, we only care about the value of the root. So, we do not need to store the hash value within each and every node. Instead, similar to canonicity in #127, we can store the hash of the root (and its negation) as two numbers in the
shared_levelized_file<node>
and merely propagate the hash values in the priority queue ofreduce
.Hash Functions
What follows are three hash function approaches of increasing complexity (and quality). They can also potentially be combined.
Meta Hash
Currently, we already provide quite a few bits of meta-information, such as the BDDs and ZDDs width and the number of arcs to terminals. We can also derive the size and the top-variable by using 1 I/O to load the first block from the disk.
We can combine all of these numbers into a single 64-bit number for a relatively decent hash function.
Linear Hash
In Adiar, a BDD is a list of nodes. Each node can be thought of as a character and hence the entire BDD as a string. If we somehow can hash a node into 64-bits, then we can accumulate all of them. This can be done easily by merely extending
node_writer::unsafe_push(const node& n)
.Hashing a single Node
Each BDD node is a triple ((x, id), low, high) of 64-bit integers.
Requirements
node
Considering most BDD nodes have high level identifiers this seems prone to overflowing. It might be useful to invert the level identifier of both children; simply bit-wise negating all of the level-identifier bits (if .is_node() is true) may suffice.
Accumulating Hashes
Based on [Thorup15] the mathematical sound solution is to use a prime p and compute the sum of hash(ni) ai modulo p where a is a seed and p is a prime number. The index i is in ascending order as the when nodes are pushed. To make the modulo operation fast, we would need to use a Mersenne Prime such as 231-1 or 281-1.
Recursive
Note: this depends on #412 .
As Randal E. Bryant was thinking about ways to improve the performance of Adiar's equality checking, he got reminded of the work of [Blum80] about hashing BDDs and ZDDs. Since the equality checking in #127 already resolved this in a much better way, I have not pursued this further.
Implementation in Reduce
Let p be a prime number (though the math may work out even when doing all computations with the non-prime p = 2k, i.e. by abusing the overflow of unsigned integers). Consider a hash function H (all numbers computed modulo p) defined as follows
Then the probability of two different BDDs share the same hash value is 1/p.
Testing
It seems hard to unit test a hash function. We may just want to create a benchmark to evaluate empirically the number of collisions?
Applications
std::hash
for Adiar's BDDs and ZDDs that use these hash values as keys in a hash table.node_stream
. This way, we can identify soft data corruptions.References
[Blum80] Manuel Blum, Ashok K. Chandra, and Mark N. Wegman. “Equivalence of free boolean graphs can be decided probabilistically in polynomial time”. In: 27th ACM/IEEE Design Automation Conference. pp. 80 – 82 (1980)
[Brace90] Karl S. Brace, Richard L. Rudell, Randal E. Bryant. “Efficient implementation of a BDD package”. In: 27th ACM/IEEE Design Automation Conference. pp. 40 – 45 (1990)
[Minato01] Shin-ichi Minato. “Zero-suppressed BDDs and their applications”. In: International Journal on Software Tools for Technology Transfer pp. 156 – 170 (2001)
[Thorup15]. “High Speed Hashing for Integers and Strings”. In: arXiv. (2015)