Qiskit / qiskit

Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
https://www.ibm.com/quantum/qiskit
Apache License 2.0
5.09k stars 2.34k forks source link

Make QuantumState.sample_counts faster, O(1) rather than O(N) #8535

Open jlapeyre opened 2 years ago

jlapeyre commented 2 years ago

The current implementation of sampling $N$ shot "counts" from QuantumState is $O(N)$. But there is a method that is $O(1)$.

EDIT: See #8618 for a more informed take on this issue. EDIT: I didn't realize when writing this that it has already been discussed in various places in Qiskit. And even implemented (else where, not in QuantumState) for example #8137

The problem

Specifically, one should sample directly from the distribution of the number of counts for each index rather than sampling from the distribution of indices and then building a count map. The current method is $O(N)$ in both time and space (although the latter is a constraint due to Python, not the algorithm itself.) The method proposed here is $O(1)$ in both time and space.

Currently, to build a count map of N samples of measuring the state vector we:

  1. Compute the probabilities $p(i) = |c_i|^2$ of each computational basis state from the instance of Statevector. (This step varies depending on the subclass of QuantumState.)
  2. Collect N samples from this discrete distribution $p(i)$. This is (in essence) a list of integer indices $i$ each representing a basis state.
  3. Build and return a count map. That is, a dictionary counts where each key is an integer index i, and counts[i] is the number of times i occurred in the list generated in step $2$.

The solution

EDIT: See the next comment for a numpy builtin function that does this solution.

A similar, but simpler problem is sampling the number of heads in $N$ coin tosses. You could simulate $N$ coin tosses and count the number of heads. But it is faster to instead sample once from the binomial distribution.

For our sampling problem:

  1. Obtain a sample $\tilde{k_1}$ of the R.V. $C_1 \sim B(N, p(1))$, the number of samples equal to the first index $1$, where $B(\cdot, \cdot)$ is the binomial distribution.
  2. Use the fact that, in general, to sample from $\Pr(X=x \text{ and } Y=y)$, we can first obtain a sample $\tilde{x}$ from $\Pr(X=x)$ and then sample from $\Pr(Y=y | X = \tilde{x})$. In the case at hand, you sample from $\Pr(C_2 = k_2 | C_1 = \tilde{k_1})$, which is the density of $B(N-\tilde{k_1}, p(2) / (1 - p(1))$.
  3. Repeat step 2 for $C_3$, and so on. More precisely, set $X = (C_1, C_2)$ and $Y=C_3$.

I didn't code this in Python. But the Julia code below can be easily translated.

julia> const pd =  [.1, .2, .4, .3]; # A probability distribution

julia> find_samps(pd, 10^12)  # N = 10^12
4-element Vector{Int64}:
 100000104386
 200000077220
 399999709325
 300000109069

julia> @btime find_samps($pd, 10^12);  #  This is like %timeit
  224.429 ns (2 allocations: 144 bytes)  # Time to run `find_samps` once.

Here is the code:

using Distributions: Binomial

"""
    find_samps(probs, N)

Sample from the distribution of counts obtained by drawing `N` samples from discrete
distribution `probs`.
"""
function find_samps(probs, N)
    sum(probs) ≈ 1 || error("probs is not a normalized probability distribution")
    Nrem = N   # N - Nrem is  the number of samples collected so far.
    counts = Int[] # samples of counts.
    q = one(eltype(probs)) # unnormalized probability of remaining i
    for p in @view probs[begin:end-1] # @view just avoids copying
        nsamp = rand(Binomial(Nrem, p / q))
        push!(counts, nsamp)
        Nrem -= nsamp
        q -= p
    end
    push!(counts, Nrem) # All the remaining counts go with last i with prob 1.
    return counts
end
jlapeyre commented 2 years ago

TL;DR There is a single numpy function that does what we want. We should use it.

A discrete distribution defined by density $(p_1, \ldots, p_n)$ is sometimes called a categorical distribution. According to wikipedia it is also called a generalized Bernouli distribution or multinoulli distribution, which I like because they are descriptive. Poking around online, including wikipedia and stackexchange, I don't see a method as efficient as what I have above for drawing many samples from this distribution. This is odd because it's not exactly rocket science.

However, just as drawing many sample from a Bernoulli distribution is equivalent to drawing one from the associated binomial distribution (i.e. the coin flipping example mentioned above.), drawing many samples from a generalized Bernoulli distribution is equivalent to a single sample from a multinomial distribution. numpy has a function for this. I didn't look at the code, but it is easy to see that it is using an $O(1)$ algorithm, probably equivalent to the one presented above (and the Julia code). (It is about 4 times slower than the Julia code, ~which makes me think it might not be implemented purely in C.~)

Here is the same example I gave above:

In [1]: from numpy.random import default_rng
In [2]: import numpy as np
In [3]: probs = np.array([.1, .2, .4, .3])
In [4]: rng = default_rng()
In [5]: rng.multinomial(100_000_000_000, probs)
Out[5]: array([10000046586, 20000067875, 40000025644, 29999859895])
In [6]: %timeit rng.multinomial(100_000_000_000, probs)
992 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

EDIT: It is exactly the same algorithm that I have above. It is pure C, but there is a bit of overhead somewhere (or maybe the rng implementation is limiting.)

void random_multinomial(bitgen_t *bitgen_state, RAND_INT_TYPE n,
                        RAND_INT_TYPE *mnix, double *pix, npy_intp d,
                        binomial_t *binomial) {
  double remaining_p = 1.0;
  npy_intp j;
  RAND_INT_TYPE dn = n;
  for (j = 0; j < (d - 1); j++) {
    mnix[j] = random_binomial(bitgen_state, pix[j] / remaining_p, dn, binomial);
    dn = dn - mnix[j];
    if (dn <= 0) {
      break;
    }
    remaining_p -= pix[j];
  }
  if (dn > 0) {
      mnix[d - 1] = dn;
  }
}

Note the check to see if any more samples are remaining to draw; I neglected to do that in the Julia version. For completeness: Julia of course has an equivalent function

julia> const multd = Multinomial(10^12, [0.1, 0.2, 0.4, 0.3]);
julia> print(rand(multd))
[99999679158, 200000111995, 400000751567, 299999457280]
julia> @btime rand(multd)
  186.036 ns (1 allocation: 96 bytes)

numpy uses Pcg64 by default and Julia uses Xoshiro256++. Casual comments online indicate that the latter is faster. It's not yet in numpy.

A test with a probability vector of length 100 rather than 4, shows numpy only 20% slower than Julia. This scaling in numpy performance is very common. It usually indicates that the worse performance with smaller arrays is due to overhead in the python/numpy interface.