SSoelvsten / adiar

An I/O-efficient implementation of (Binary) Decision Diagrams
https://ssoelvsten.github.io/adiar/
MIT License
24 stars 13 forks source link

Compute Hash Values for Decision Diagrams #434

Open SSoelvsten opened 1 year ago

SSoelvsten commented 1 year ago

To use BDDs, resp. ZDDs, in any hash-related context (e.g. std::unordered_map and std::unordered_set), one needs to derive a hash value for each BDD, resp. ZDD, care about the hash value at the root.

Notice, we only care about the value of the root. So, we do not need to store the hash value within each and every node. Instead, similar to canonicity in #127, we can store the hash of the root (and its negation) as two numbers in the shared_levelized_file<node> and merely propagate the hash values in the priority queue of reduce.

Hash Functions

What follows are three hash function approaches of increasing complexity (and quality). They can also potentially be combined.

Meta Hash

Currently, we already provide quite a few bits of meta-information, such as the BDDs and ZDDs width and the number of arcs to terminals. We can also derive the size and the top-variable by using 1 I/O to load the first block from the disk.

We can combine all of these numbers into a single 64-bit number for a relatively decent hash function.

Linear Hash

In Adiar, a BDD is a list of nodes. Each node can be thought of as a character and hence the entire BDD as a string. If we somehow can hash a node into 64-bits, then we can accumulate all of them. This can be done easily by merely extending node_writer::unsafe_push(const node& n).

Hashing a single Node

Each BDD node is a triple ((x, id), low, high) of 64-bit integers.

Requirements

node

Considering most BDD nodes have high level identifiers this seems prone to overflowing. It might be useful to invert the level identifier of both children; simply bit-wise negating all of the level-identifier bits (if .is_node() is true) may suffice.

Accumulating Hashes

Based on [Thorup15] the mathematical sound solution is to use a prime p and compute the sum of hash(ni) ai modulo p where a is a seed and p is a prime number. The index i is in ascending order as the when nodes are pushed. To make the modulo operation fast, we would need to use a Mersenne Prime such as 231-1 or 281-1.

Recursive

Note: this depends on #412 .

As Randal E. Bryant was thinking about ways to improve the performance of Adiar's equality checking, he got reminded of the work of [Blum80] about hashing BDDs and ZDDs. Since the equality checking in #127 already resolved this in a much better way, I have not pursued this further.

Implementation in Reduce

Let p be a prime number (though the math may work out even when doing all computations with the non-prime p = 2k, i.e. by abusing the overflow of unsigned integers). Consider a hash function H (all numbers computed modulo p) defined as follows

Then the probability of two different BDDs share the same hash value is 1/p.

Testing

It seems hard to unit test a hash function. We may just want to create a benchmark to evaluate empirically the number of collisions?

Applications

References

SSoelvsten commented 11 months ago

To quickly have the feature (but with fewer/no mathematical guarantees) we can also quicky do one of the following: