✨ Mark and Sweep Garbage Collection for DD Package

What's the problem this feature will solve?

As off now, our decision diagram package uses manual reference counting for garbage collection, i.e., every DD Node contains a 32bit field tracking a reference count. The reference count is increased and decreases before and after manipulating decision diagrams.

For example, in classical simulation, the reference count of the initial state (DD) is increased. Then, the quantum operations of the computation are applied one after the other while always increasing the reference count of the new state and decreasing the reference count of the old state.

After each adjustment, the garbage collection routine is called, which checks all the unique unique tables for whether they have reached their (configurable) garbage collection limit. If one of them reaches their limit, the real collection is started and the respective compute tables are cleared.

On the positive side, this kind of reference counting always allows one to determine the amount of "dead" nodes in the unique tables, i.e., nodes that have a reference count of 0. If only a single active DD is maintained at a time, this directly gives the size of that decision diagram in O(1) complexity. It also allows to directly judge the percentage of active vs dead nodes in the unique tables, which can be used as the basis for a decision to grow the unique table or to conduct garbage collection. However, the latter scenario is hypothetical at the moment, as the tables can currently not be resized.

On the negative side,

the reference count has a negative impact on the memory footprint of each DD node. Space savings in each node can be extremely valuable and drastically increase performance.
the reference count has a negative runtime impact in the sense that constantly increasing and decreasing it consumes a measurable amount of time that would be better spent in actual computations most of the time
the way we currently update the reference count during simulation is not very favourable to keeping compute table entries around. Since we never increase the reference count of the quantum operation DDs when performing the simulation, any matrix node will always be considered "dead". So even if we were to adopt something like #335, it would always collect the whole MxV multiplication table if there are too many matrix nodes.

In essence, keeping track of the reference count doesn't really get us much, while it leads to several performance throttles.

Describe the solution you'd like

The literature on Binary Decision Diagrams (BDDs) has established two kinds of garbage collection schemes: reference counting, as we adopted it, or a mark-and-sweep approach, where there is no dedicated reference count field in each node and no continuous reference counting. Instead, at periodic intervals, the population of the unique tables is checked (our garbage collection check), and if a table contains too many nodes, all active DDs are traversed once in a depth-first fashion where every node is marked. (This mark can probably be optimized/hidden in one of the pointer bits of the node) Subsequently, the unique tables are crawled and any non-marked node is collected. Afterwards, the active nodes are unmarked.

This only requires two traversals of the active decision diagram at the point of initiating the mark-and-sweep. The reference counting approach would have at least traversed as much of the diagrams over the course of the reference counting procedure. So this is an easy win for runtime. As for the memory footprint, this also completely eliminates the need for the ref member in nodes. The "mark" can be hidden in the bits of a pointer or in a flag array (similar to what we already use for matrices) if the alignment of the node structs permit adding such flags without increasing their size (effectively making use of the padding due to alignment requirements).

Naturally, this is a breaking change and will require adaptation through this library as well as mqt-dddim and mqt-qcec. However, the degree of changes should be moderately low.

As always with these kinds of changes, it is important to benchmark the overall performance of the resulting package. A perfect use case for our DD package benchmarking tool (which might require a few adaptations as well based on the proposed changes).

cda-tum / mqt-core

✨ Mark and Sweep Garbage Collection for DD Package #644

What's the problem this feature will solve?

Describe the solution you'd like