Refactor unit-test Qureg sizes for better distributed coverage

TysonRayJones commented 2 years ago

Presently, the unit tests use only Quregs of a fixed size; 5 qubits (both for state-vectors and density matrices). This is for several reasons:

Github CI imposes a time limit on the execution of our unit tests.
QuEST's function runtimes typically scale exponentially with the #qubits, although 5 qubits is still lightning fast, but...
The unit tests (where feasible) currently attempt every single possible input to a tested function. For example, this means all combinations and permutations of qubit indices. The space of possible inputs grows extremely rapidly with increased number of qubits. 5 qubits is the threshold above which the number of inputs to functions like mixMultiQubitKrausMap explode intractably. Hence, systematic testing of the QuEST API for 6+ qubits is infeasible.

This imposes a limit on the number of nodes which can distribute our unit tests , since there can be no fewer than 1 amplitude per node. Hence, we cannot distribute 5 qubit state-vectors between more than 32 nodes. Furthermore, despite that we could in-principle distribute 5 qubit density-matrices between up to 1024 nodes, the current unit tests require the state-vectors and density-matrices to contain the same number of qubits. This is mandatory in some functions like calcFidelity, but in others, is just a result of software structure and code cleanliness.

This means our unit tests can be run on a maximum of 32 nodes - and this is a big problem for our density matrix tests. A 5 qubit density matrix distributed between 32 nodes contains exactly one column of amplitudes on every node. When distributed between fewer nodes, each node will contain more than one column. But since we cannot employ more than 32 nodes, we cannot access the simulation regime where a node contains fewer than one column (e.g. a half column, or quarter).

Although a somewhat esoteric deployment pattern in real simulations, this is a very very important regime to test, since it can drastically change the nature of the distributed algorithm. In fact, issues #323, #324, #325 and #326 relate to algorithmic errors due to a lack of consideration of this regime - and they were undiscovered because the unit tests which would otherwise expose these problems cannot be run at >= 64 nodes!

Some functions, like mixTwoQubitDepolarisingChannel, have (when corrected) distinct algorithmic regimes for when each node contains 'fewer than a quarter column', 'fewer than a half column', and 'one or more columns'. So it is absolutely crucial we test all these regimes! Furthermore, the regime where a node contains fewer than a column often leads to more complicated algorithms. This has meant (as was the case in the above issues) that some QuEST distributed algorithms are erroneously conceived geometrically with a >= 1 column-per-node picture in mind. This dooms the algorithm to unknowingly fail on a distribute regime we cannot yet unit test.

Presently, the unit tests consult a NUM_QUBITS pre-processor macro for the Qureg size. Indeed this is easy to modify in-code, or overwrite at compilation time. However, we cannot simply scale up all our systematic testing to higher qubit numbers, since the possible input space and hence runtime explodes. It is tempting to propose randomised testing of some functions at some higher qubit numbers. But as alluded to in #324, some functions can have distinct algorithmic regimes which are very rare, activated by very obscure circumstances; as #qubits increase, they become increasingly unlikely to be triggered in random testing. It is true that in such a case, the author of the function can recognise the distinct algorithmic regimes, and ensure they are all covered by bespoke higher-qubit tests; but sometimes the author (like myself) may not realise the need for distinct regimes, and hence not create these bespoke tests (like in #326).

Correcting this limitation in our current distributed unit tests appears quite a challenge! I leave this post to describe only the problem, so that solution proposals may appear below.

TysonRayJones commented 2 years ago

Presently, QuEST's distributed unit tests are not run on Github CI, which would limit them to 2 nodes. They are instead manually performed on up to 16 nodes on our office workstations, or more nodes on our supercomputers, before each release. There are plans to employ a private machine for distributed testing, to use more nodes and longer timeouts, and integrate it with Github CI.

But clearly in addition to this, we must develop a new suite of distributed tests using larger density matrices, which can access the algorithmic regimes of 'fewer than one column per node'. It will likely be prohibitve to test all functions in this new bespoke test suite; perhaps those not tested must accompany a formal proof of their correctness (like in the distributed manuscript currently in preparation).

This new suite of tests could be scheduled to run on the Oxford ARCUS supercomputers for one week, launched every ~3 months, or triggered manually before a QuEST release. The tests could be randomised, but each time with a unique random seed, to widen the coverage between tests.

TysonRayJones commented 5 days ago

In v4, density-matrix quregs are restricted (as per d179d836db1205abd3ac1abc23bdcdefb966cfc7) from being distributed such that a single column spans multiple nodes. This turns out to be a ridiculous and unperformant regime, and violates the preconditions of many algorithms

QuEST-Kit / QuEST

Refactor unit-test Qureg sizes for better distributed coverage #327