gongaa / SlidingWindowDecoder

Sliding window with Guided Decimation Guessing Decoding for QLDPC codes.
MIT License
12 stars 3 forks source link

Problem about the Sliding Window OSD file #1

Closed ElonDormancy closed 6 months ago

ElonDormancy commented 6 months ago

I'm reading the file SlidingWindowDecoder Sliding Window OSD.ipynb to follow it,and there is some confusing things about the code.

First when we iterate the anchors to get the sparse parity-check matrix of each window(i guess like the $H_{win}$ mentioned in shilin's paper [https://arxiv.org/abs/2311.03307]) and if the window is not the final(last) window,thec = anchors[top_left + W - 1].But when we use the $H_{win}$ to decode,c = anchors[top_left+F] # commit region bottom rightWhy the two c are different? It makes me confused and it seems that when I set the W=3,F=3 and run the Sliding Window OSD.ipynb it will raise error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 sliding_window_decoder(N=144, p=0.004, num_repeat=12, W=3, F=3, num_shots=10000, max_iter=200, method=1, 
      2                        z_basis=True)
      4 plt.hist([x*1000 for x in decoding_time]) # convert s to ms

Cell In[4], line 154, in sliding_window_decoder(N, p, num_repeat, num_shots, max_iter, W, F, z_basis, noisy_prior, method, plot, shorten)
    152         total_e_hat[j][a[1]:b[1]] = e_hat
    153     else:
--> 154         total_e_hat[j][a[1]:c[1]] = e_hat[:c[1]-a[1]]
    157 print(f"Window {i}, flagged Errors: {num_flag_err}/{num_shots}")
    159 new_det_data = (det_data + total_e_hat @ chk.T) % 2

ValueError: could not broadcast input array from shape (1656,) into shape (2088,)

So which c is correct?

gongaa commented 6 months ago

Hi, this broadcast issue is due to column-merging of the bottom right part of $H_{win}$, caused by method=1. The merging is described at the end of Round Analysis.ipynb and our paper. Please change method=1 to method=0 for no merging when using non-overlapping windows $W=F$, and please also change c=anchors[top_left+F] to c = anchors[min(top_left+F, len(anchors)-1)].

I extract c = anchors[top_left+W-1] so that part of the window can be stacked with the merged columns (identity matrix). It has nothing to do with the later c = anchors[top_left+F] for determining the commit region.

ElonDormancy commented 6 months ago

OK,i see.Thanks :)

ElonDormancy commented 6 months ago

Hi,Anqi,Sorry to bother you again,I still have some problems about the sliding window decoder details.

Question 1:About columns merging when method=1

As you mentioned in Round Analysis.ipynb

To toggle between different level of column merging, change method below to 0 (no merging), 1 (mild merging, most recommended), 2 (merge all columns that trigger only the detectors from the last round in the window).

method=0 means no merging that is easy to understand,method=2 you merge the last SM circuit detectors in one window.But the method=1 makes me confused, you overwrite the c=(c[0], c[1]+n_half*3) in Z basis(X basis).Why you add n_half*3 in c[1] which represents a single fault realization of the syndrome measurement . n_half*3 represents the next round SM Z stabilizer detectors and n_half*2=n represents the next round SM X stabilizer detectors. And what is the merging strategy for method=1?

To improve efficiency, we merge the columns of the bottom right (full-rank) matrix $H_1$ in each window to an identity matrix and recalculate priors.

[cite from your paper]. And your code is:

noisy_syndrome_prior= np.sum(chk[c[0]:b[0],c[1]:b[1]]* priors[c[1]:b[1]], axis=1)#you recalculate the priors.
noisy_syndrome= np.zeros((n_half*W,n_half))
noisy_syndrome[-n_half:,:]= np.eye(n_half)# * noisy_syndrome_prior

Why did you comment out noisy_syndrome_prior?

Question 2:XOR

Yes, when we match the syndrome like $(\sigma_1,\sigma_2,\sigma_3\cdots\sigma_N)\rightarrow(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2\cdots\sigmaN-\sigma{N-1})$,one can see that no single fault triggers detectors from more than two rounds.But when we divided them into many windows like (3,1) Sliding Window, we will get the first window syndrome $\text{SM}_1 = (\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2)$ it’s right,but when we decode the second window, $\text{SM}_2 = (\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4-\sigma_3)$ not the correct $\text{SM}_2^c = (\sigma_2,\sigma_3-\sigma_2,\sigma_4-\sigma_3)$.which one is correct?

And the final round of syndrome measurement from the destructive measurement of data qubits ,which is more exact than the SM get from the syndrome measurement circuit. The SM is $\sigmaN-\sigma{N-1}$ not the $\sigma_N$. So if i want to use more accurate decoder for the final round syndrome measurement, which one is correct?

gongaa commented 6 months ago

No worries :) I'm happy that somebody really cares about the paper, and I also want to take the chance to clarify some subtle points through your questions.

Question 1: First note that everything I wrote, the paper, the code, is specific for the BB code and for the particular circuit proposed by IBM, if you change the circuit or EC code, then the offset won't be n_half*3, and the shape of $\mathbf{H}_0, \mathbf{H}_1, \mathbf{H}_2$ that I described in Fig. 1 caption of our paper wouldn't hold (but the SC-LDPC like structure will remain if X and Z are decoded separately, so you can reuse most of my code after figuring out what their shapes are for your own purpose).

Now come back to your question, why for method 1, I overwrote c=(c[0], c[1]+n_half*3). Before this statement, within a window, c is the anchor pointing at the top left of bottom-most $\mathbf{H}_0$ (please have Fig. 1 by your side). Since $\mathbf{H}_0$ have shape $(N/2, N/2*3)$, after this assignment, c points to the top left of bottom-most $\mathbf{H}_1$.

The next statement, noisy_prior = np.sum(chk[c[0]:b[0],c[1]:b[1]] * priors[c[1]:b[1]], axis=1). chk[c[0]:b[0],c[1]:b[1]] is this $\mathbf{H}_1$, and priors[c[1]:b[1]] is the priors associated with each column of this $\mathbf{H}_1$. Remember that a parity-check matrix is always a zero-one matrix (this also answers your next question, # * noisy_syndrome_prior is a shorthand for 'this new prior will be associated to the merged column', please forgive my laziness in this comment). The merging strategy regarding the prior is just for each row (detector), sum up the priors of the columns of $\mathbf{H_1}$ (fault mechanism) that triggers this detector. It turns out that, for this particular IBM circuit, the symmetry leads to each detector having the same value, you can print out noisy_prior to see. That's why I only print out the first one noisy_prior[0] in the print statement. This merging strategy is indeed very heuristic, so I also allow users to overwrite noisy_prior from None to anything they like in the decoding interface.

For the X basis, I recommend using offset n_half*3. But I still left n_half*2 there. Please take a look at the window PCMs from experiments z_basis=False in OSD.ipynb. Though $\mathbf{H}_0$ still has shape $(N/2, N/2*3)$, it looks a little different from Z basis $\mathbf{H}_0$. The last $N/2$ columns from $\mathbf{H}_0$ in X basis all have the same prior (see Round Analysis.ipynb for partitioning columns according to their priors). I don't have this for free after permuting columns in (the more standard) Z basis, so I thought now we have more opportunities to study the effect of merging strength on logical error rate in X basis.

Question 2: Neither is correct, the next syndrome should have dependency on the first few columns that you committed to before moving to the next window. Maybe this sentence from the paper can answer your question?

By committing to $\hat{\mathbf{e}}_0$ and $\hat{\mathbf{e}}_1$, we need to update $\mathbf{s}_2$ used for the next window by $\mathbf{s}'_2 = \mathbf{s}_2 + \mathbf{H}_2\hat{\mathbf{e}}_1$, then use $\mathbf{s}'_2, \mathbf{s}_3, \mathbf{s}_4$ to decode the next window.

The $\mathbf{s}_1, \mathbf{s}_2, \mathbf{s}_3, \dots$ detector values returned by Stim correspond to your $\sigma_1, \sigma_2-\sigma_1, \sigma_3-\sigma_2, \dots$. If that doesn't answer your question, then perhaps you have some confusion in circuit code construction. Maybe the following helps: each row is already a detector (see row 43 of src/build_circuit.py), i.e., it gives you $s_1,s_2,s_3$, not $\sigma_1,\sigma_2,\sigma_3$.

I'm not sure I understand your last question. From my implementation in /src/build_circuit.py, you can see I first calculate $\sigmaN$ (noiseless syndrome calculation from collapsed data qubit measurement result, line 215,216), then I XOR it with $\sigma{N-1}$ (line 217). So this detector (row) is $\sigmaN-\sigma{N-1}$. For the sparsity of PCM and thus better performance of BP-related methods, I do think this choice makes more sense than $\sigma_N$ alone. If you use $\sigma_N$ for detectors, then fault mechanisms from any round (even the first) can trigger them, you have to rewrite the syndrome update when it comes to the last window as well.

Last point: I want to emphasize that one should not count on the noiseless syndrome too much, because they won't be available before non-Clifford gates.

ElonDormancy commented 6 months ago

Thanks for your detailed reply:),i have learned a lot ! i may now have a better understanding of method=1 .as commented above when we choose method=1 we merge the $H_1$ because of the symmetry of the IBM code. So we can merge it.

But for the question 2,i add more information about my question so we can discuss further.

As you mentioned in your paper

For example, assume detector values $s_1, s_2, s_3$ . . . are observed

and in your src/build_circuit.py

detector_circuit_str = ""
for i in range(n//2):
    detector_circuit_str += f"DETECTOR rec[{-n//2+i}]\n"
    detector_circuit = stim.Circuit(detector_circuit_str)

detector_repeat_circuit_str = ""
for i in range(n//2):
    detector_repeat_circuit_str += f"DETECTOR rec[{-n//2+i}] rec[{-n-n//2+i}]\n"
    detector_repeat_circuit = stim.Circuit(detector_repeat_circuit_str)

We can see your XOR operations to realize no single fault triggers detectors from more than two rounds.We call the SM result is $\sigma_i$ so the syndromes of multi-rounds are $(\sigma1,\sigma{2}-\sigma1\cdots,\sigma{N}-\sigma_{N-1})\rightarrow(s_1,s_2,s_3\cdots,s_N)$

When we decode the first window we use the syndrome result $(s_1,s_2,s_3)$ for example $(W,F)=(3,3)$ window.And the next window is $(s_4,s_5,s_6)$ of course if we use the overlaping window like $(3,1)$ we must update the $s_2' = s_2 +H_2 \hat{e}_1$.We correct all the qubit and measurement errors in time interval $(0, F]$ and keep the updated syndrome vectors in time interval $[F + 1, W ]$ for the next error correction (EC) cycle.But when we use the non-overlaping window, we may do not update the syndrome(it’s correct?).So when we use the non-overlaping window, we need to change the src/build_circuit.py like-

# Z check detectors
if z_basis:
    if num_round%3!=0:#(W,F)=(3,3) as example
        circuit += detector_repeat_circuit
    else:
        circuit += detector_circuit

to let the multi-round syndromes are:

first two windows as example: $(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4,\sigma_5-\sigma_4,\sigma_6-\sigma_5)$ not the $(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4-\sigma_3,\sigma_5-\sigma_4,\sigma_6-\sigma_5)$.That’s the supplementary information for my question 2.

In paper https://arxiv.org/abs/2308.08648 he use the BP decoder on intervals of noisy cycles to control error accumulation. For the final round, they apply the BP+OSD decoder to eliminate residual syndromes and project the system into the codespace. And your point about the final round that If i use $\sigma_N$ for detectors, then fault mechanisms from any round (even the first) can trigger them,but if we use $\sigmaN-\sigma{N-1}$ we only consider the fault mechanisms from the last round and penultimate round SM. Do I understand correctly? and now if we use the non-overlaping window,in the final round which one is better?

And i agree with your last point ! even though it will decrease the logical memory error rate a lot.

gongaa commented 6 months ago

I am glad you mentioned this paper, I wrote down the last point with this particular paper in my mind.

I don't think their paper contains enough detail for me to reverse-engineer their method, so the following is my guess. (You should contact them for this question if you are only interested in non-overlapping window...)

Screenshot 2024-04-25 at 19 36 28

Whether their BP methods could work in suppressing logical errors in experiments containing non-Clifford gates, is something I want to learn from their future experiments. Let's wait and see until their neutral atom platform can do repeated SM (or mid-circuit measurement).

ElonDormancy commented 6 months ago

OK,your detailed reply has resolved all my questions.Thanks a lot ! And overlapping decoding is necessary for improvement of decoding accuracy beyond the non-overlapping strategy.

I set the $p=1.5\times 10^{-3}$ and using BP on all but the last window (only contains this noiseless SM round) and final noiseless SM round i use the BP+OSD decoder to decode and the logical failure rate (per code cycle) stabilizes as the number of code cycles increases for both the HGP codes.I agree with you that np.logical_or(flagged_err, logical_err).astype(int).sum()) is of course an underestimation of LER. Because we must consider the logical error led by noisy syndrome measurement feedback. HGP_Number_of_cycles