Problem about the Sliding Window OSD file

I'm reading the file SlidingWindowDecoder Sliding Window OSD.ipynb to follow it,and there is some confusing things about the code.

First when we iterate the anchors to get the sparse parity-check matrix of each window(i guess like the $H_{win}$ mentioned in shilin's paper [https://arxiv.org/abs/2311.03307]) and if the window is not the final(last) window,thec = anchors[top_left + W - 1].But when we use the $H_{win}$ to decode,c = anchors[top_left+F] # commit region bottom rightWhy the two c are different? It makes me confused and it seems that when I set the W=3,F=3 and run the Sliding Window OSD.ipynb it will raise error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 sliding_window_decoder(N=144, p=0.004, num_repeat=12, W=3, F=3, num_shots=10000, max_iter=200, method=1, 
      2                        z_basis=True)
      4 plt.hist([x*1000 for x in decoding_time]) # convert s to ms

Cell In[4], line 154, in sliding_window_decoder(N, p, num_repeat, num_shots, max_iter, W, F, z_basis, noisy_prior, method, plot, shorten)
    152         total_e_hat[j][a[1]:b[1]] = e_hat
    153     else:
--> 154         total_e_hat[j][a[1]:c[1]] = e_hat[:c[1]-a[1]]
    157 print(f"Window {i}, flagged Errors: {num_flag_err}/{num_shots}")
    159 new_det_data = (det_data + total_e_hat @ chk.T) % 2

ValueError: could not broadcast input array from shape (1656,) into shape (2088,)

So which c is correct?

Hi, this broadcast issue is due to column-merging of the bottom right part of $H_{win}$, caused by method=1. The merging is described at the end of Round Analysis.ipynb and our paper. Please change method=1 to method=0 for no merging when using non-overlapping windows $W=F$, and please also change c=anchors[top_left+F] to c = anchors[min(top_left+F, len(anchors)-1)].

I extract c = anchors[top_left+W-1] so that part of the window can be stacked with the merged columns (identity matrix). It has nothing to do with the later c = anchors[top_left+F] for determining the commit region.

Hi,Anqi,Sorry to bother you again,I still have some problems about the sliding window decoder details.

Question 1:About columns merging when method=1

As you mentioned in Round Analysis.ipynb

To toggle between different level of column merging, change method below to 0 (no merging), 1 (mild merging, most recommended), 2 (merge all columns that trigger only the detectors from the last round in the window).

method=0 means no merging that is easy to understand,method=2 you merge the last SM circuit detectors in one window.But the method=1 makes me confused, you overwrite the c=(c[0], c[1]+n_half*3) in Z basis(X basis).Why you add n_half*3 in c[1] which represents a single fault realization of the syndrome measurement . n_half*3 represents the next round SM Z stabilizer detectors and n_half*2=n represents the next round SM X stabilizer detectors. And what is the merging strategy for method=1?

To improve efficiency, we merge the columns of the bottom right (full-rank) matrix $H_1$ in each window to an identity matrix and recalculate priors.

[cite from your paper]. And your code is:

noisy_syndrome_prior= np.sum(chk[c[0]:b[0],c[1]:b[1]]* priors[c[1]:b[1]], axis=1)#you recalculate the priors.
noisy_syndrome= np.zeros((n_half*W,n_half))
noisy_syndrome[-n_half:,:]= np.eye(n_half)# * noisy_syndrome_prior

Why did you comment out noisy_syndrome_prior?

Question 2:XOR

Yes, when we match the syndrome like $(\sigma_1,\sigma_2,\sigma_3\cdots\sigma_N)\rightarrow(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2\cdots\sigmaN-\sigma{N-1})$,one can see that no single fault triggers detectors from more than two rounds.But when we divided them into many windows like (3,1) Sliding Window, we will get the first window syndrome $\text{SM}_1 = (\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2)$ it’s right,but when we decode the second window, $\text{SM}_2 = (\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4-\sigma_3)$ not the correct $\text{SM}_2^c = (\sigma_2,\sigma_3-\sigma_2,\sigma_4-\sigma_3)$.which one is correct?

And the final round of syndrome measurement from the destructive measurement of data qubits ,which is more exact than the SM get from the syndrome measurement circuit. The SM is $\sigmaN-\sigma{N-1}$ not the $\sigma_N$. So if i want to use more accurate decoder for the final round syndrome measurement, which one is correct?

No worries :) I'm happy that somebody really cares about the paper, and I also want to take the chance to clarify some subtle points through your questions.

Question 1: First note that everything I wrote, the paper, the code, is specific for the BB code and for the particular circuit proposed by IBM, if you change the circuit or EC code, then the offset won't be n_half*3, and the shape of $\mathbf{H}_0, \mathbf{H}_1, \mathbf{H}_2$ that I described in Fig. 1 caption of our paper wouldn't hold (but the SC-LDPC like structure will remain if X and Z are decoded separately, so you can reuse most of my code after figuring out what their shapes are for your own purpose).

Now come back to your question, why for method 1, I overwrote c=(c[0], c[1]+n_half*3). Before this statement, within a window, c is the anchor pointing at the top left of bottom-most $\mathbf{H}_0$ (please have Fig. 1 by your side). Since $\mathbf{H}_0$ have shape $(N/2, N/2*3)$, after this assignment, c points to the top left of bottom-most $\mathbf{H}_1$.

The next statement, noisy_prior = np.sum(chk[c[0]:b[0],c[1]:b[1]] * priors[c[1]:b[1]], axis=1). chk[c[0]:b[0],c[1]:b[1]] is this $\mathbf{H}_1$, and priors[c[1]:b[1]] is the priors associated with each column of this $\mathbf{H}_1$. Remember that a parity-check matrix is always a zero-one matrix (this also answers your next question, # * noisy_syndrome_prior is a shorthand for 'this new prior will be associated to the merged column', please forgive my laziness in this comment). The merging strategy regarding the prior is just for each row (detector), sum up the priors of the columns of $\mathbf{H_1}$ (fault mechanism) that triggers this detector. It turns out that, for this particular IBM circuit, the symmetry leads to each detector having the same value, you can print out noisy_prior to see. That's why I only print out the first one noisy_prior[0] in the print statement. This merging strategy is indeed very heuristic, so I also allow users to overwrite noisy_prior from None to anything they like in the decoding interface.

For the X basis, I recommend using offset n_half*3. But I still left n_half*2 there. Please take a look at the window PCMs from experiments z_basis=False in OSD.ipynb. Though $\mathbf{H}_0$ still has shape $(N/2, N/2*3)$, it looks a little different from Z basis $\mathbf{H}_0$. The last $N/2$ columns from $\mathbf{H}_0$ in X basis all have the same prior (see Round Analysis.ipynb for partitioning columns according to their priors). I don't have this for free after permuting columns in (the more standard) Z basis, so I thought now we have more opportunities to study the effect of merging strength on logical error rate in X basis.

Question 2: Neither is correct, the next syndrome should have dependency on the first few columns that you committed to before moving to the next window. Maybe this sentence from the paper can answer your question?

By committing to $\hat{\mathbf{e}}_0$ and $\hat{\mathbf{e}}_1$, we need to update $\mathbf{s}_2$ used for the next window by $\mathbf{s}'_2 = \mathbf{s}_2 + \mathbf{H}_2\hat{\mathbf{e}}_1$, then use $\mathbf{s}'_2, \mathbf{s}_3, \mathbf{s}_4$ to decode the next window.

The $\mathbf{s}_1, \mathbf{s}_2, \mathbf{s}_3, \dots$ detector values returned by Stim correspond to your $\sigma_1, \sigma_2-\sigma_1, \sigma_3-\sigma_2, \dots$. If that doesn't answer your question, then perhaps you have some confusion in circuit code construction. Maybe the following helps: each row is already a detector (see row 43 of src/build_circuit.py), i.e., it gives you $s_1,s_2,s_3$, not $\sigma_1,\sigma_2,\sigma_3$.

I'm not sure I understand your last question. From my implementation in /src/build_circuit.py, you can see I first calculate $\sigmaN$ (noiseless syndrome calculation from collapsed data qubit measurement result, line 215,216), then I XOR it with $\sigma{N-1}$ (line 217). So this detector (row) is $\sigmaN-\sigma{N-1}$. For the sparsity of PCM and thus better performance of BP-related methods, I do think this choice makes more sense than $\sigma_N$ alone. If you use $\sigma_N$ for detectors, then fault mechanisms from any round (even the first) can trigger them, you have to rewrite the syndrome update when it comes to the last window as well.

Last point: I want to emphasize that one should not count on the noiseless syndrome too much, because they won't be available before non-Clifford gates.

Thanks for your detailed reply:),i have learned a lot ! i may now have a better understanding of method=1 .as commented above when we choose method=1 we merge the $H_1$ because of the symmetry of the IBM code. So we can merge it.

But for the question 2,i add more information about my question so we can discuss further.

As you mentioned in your paper

For example, assume detector values $s_1, s_2, s_3$ . . . are observed

and in your src/build_circuit.py

detector_circuit_str = ""
for i in range(n//2):
    detector_circuit_str += f"DETECTOR rec[{-n//2+i}]\n"
    detector_circuit = stim.Circuit(detector_circuit_str)

detector_repeat_circuit_str = ""
for i in range(n//2):
    detector_repeat_circuit_str += f"DETECTOR rec[{-n//2+i}] rec[{-n-n//2+i}]\n"
    detector_repeat_circuit = stim.Circuit(detector_repeat_circuit_str)

We can see your XOR operations to realize no single fault triggers detectors from more than two rounds.We call the SM result is $\sigma_i$ so the syndromes of multi-rounds are $(\sigma1,\sigma{2}-\sigma1\cdots,\sigma{N}-\sigma_{N-1})\rightarrow(s_1,s_2,s_3\cdots,s_N)$

When we decode the first window we use the syndrome result $(s_1,s_2,s_3)$ for example $(W,F)=(3,3)$ window.And the next window is $(s_4,s_5,s_6)$ of course if we use the overlaping window like $(3,1)$ we must update the $s_2' = s_2 +H_2 \hat{e}_1$.We correct all the qubit and measurement errors in time interval $(0, F]$ and keep the updated syndrome vectors in time interval $[F + 1, W ]$ for the next error correction (EC) cycle.But when we use the non-overlaping window, we may do not update the syndrome(it’s correct?).So when we use the non-overlaping window, we need to change the src/build_circuit.py like-

# Z check detectors
if z_basis:
    if num_round%3!=0:#(W,F)=(3,3) as example
        circuit += detector_repeat_circuit
    else:
        circuit += detector_circuit

to let the multi-round syndromes are:

first two windows as example: $(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4,\sigma_5-\sigma_4,\sigma_6-\sigma_5)$ not the $(\sigma_1,\sigma_2-\sigma_1,\sigma_3-\sigma_2,\sigma_4-\sigma_3,\sigma_5-\sigma_4,\sigma_6-\sigma_5)$.That’s the supplementary information for my question 2.

In paper https://arxiv.org/abs/2308.08648 he use the BP decoder on intervals of noisy cycles to control error accumulation. For the final round, they apply the BP+OSD decoder to eliminate residual syndromes and project the system into the codespace. And your point about the final round that If i use $\sigma_N$ for detectors, then fault mechanisms from any round (even the first) can trigger them,but if we use $\sigmaN-\sigma{N-1}$ we only consider the fault mechanisms from the last round and penultimate round SM. Do I understand correctly? and now if we use the non-overlaping window,in the final round which one is better?

And i agree with your last point ! even though it will decrease the logical memory error rate a lot.

I am glad you mentioned this paper, I wrote down the last point with this particular paper in my mind.

I don't think their paper contains enough detail for me to reverse-engineer their method, so the following is my guess. (You should contact them for this question if you are only interested in non-overlapping window...)

I think using $\sigma_4$ instead of $\sigma_4-\sigma_3$ is only causing you extra labor, because you need to update syndrome for the next window anyway (though they didn't mention this). The reason is as follows. Consider the error string that happened just before your SM that leads to $\sigma_1$, call it $x_1$, and also consider $x_4$ that leads to $sigma_4$. After three SM, you will expect $wt(x_4)>>wt(x_1)$, and therefore $wt(\sigma_4)\gg wt(\sigma_1)$. As you can see from the figure below, where I rewrote build_circuit.py using your suggestion. Feeding such a high-weight syndrome to BP is doomed to fail, plus you will still want to use window PCMs chk[:108,:1044], chk[108:216, 1044:2124] to decode, instead of also including the columns to their left. So what you will do is to update $\sigma_4$ by XORing it with chk[108:144, :1044] @ e_hat. This will be equivalent to constructing the DEM using $\sigma_4-\sigma_3$ in the first place. Btw, in Z-basis, you are not allowed to put $\sigma_X$ for X stabilizers as detectors (if use_both=True) since it's a random string in the absence of noise, i.e., you must do XOR.

Screenshot 2024-04-25 at 19 36 28

For the last noiseless SM round, I do have a lot to say. I mentioned in my paper, for GDG (no convergence guarantee), I always require Eq 8 and 9 both to hold for the decoding to be deemed as a success. For a week or so when I first started the sliding window project, I thought the simulation result in their paper was wrong, because by using BP alone, Eq 8 failed 10% of the time (say at $p=0.03$). Then I realized this overestimation problem, which I mentioned in Appendix D. Naively, you can count errors like num_err=logical_err.astype(int).sum() (instead of num_err = np.logical_or(flagged_err, logical_err).astype(int).sum()), which is of course an underestimation of LER, but you can conduct this simulation by replacing BP+OSD with BP on all previous windows, even in an overlapping window setting, and you can compare the LER to all window BP+OSD. I remember you can still see quite a significant performance drop. For their method, see, by decoding using BP+OSD on $\sigma_N$ alone (I mean no syndrome update via previous decoding etc.) already can give you a correction operation that sends you back to the code space, so you get $H_Z(x+\hat{x})=0$ for free thanks to OSD. Now, let's compare the two methods, either using BP+OSD or BP on all but the last window (only contains this noiseless SM round), I think it will be intuitive to say $\sigma'_N$ after the syndrome update from the BP+OSD method will have a smaller weight than from the BP method. Therefore, when you feed it to the decoder, it's easier for it to get the answer right, leading to a smaller LER.
Go back to your question of whether you should use $\sigma_N$ or $\sigmaN-\sigma{N-1}$, I hope it is already clear that they are equivalent. If you don't want to include the leftmost columns in your PCM for decoding, and if you want the input syndrome to have low enough weights. You have to update your syndrome before feeding it to the decoder for the $\sigma_N$ method, the updated syndrome will be the same, and the PCM you extract for this stand-alone round will be the same as well chk[324:360, 3204:3312] which is just $H_Z$ vertical stacked with some CNOT XX fault columns.
Finally, about how to avoid over/underestimating LER for decoder methods without convergence guarantee. Put the single data measurement each as a detector, i.e., no combination of them according to PCM, and also define no logical OBSERVABLE. You can get those things by post-processing DEM sampler's result yourself. By using this method, you will be able to know the final error string $x$ (you can only compare to it, you cannot use it instead of its syndrome in decoding). You will also be able to know what's the end error string of each fault mechanism (column), then you can obtain $\hat{x}$ for your column estimation.

Whether their BP methods could work in suppressing logical errors in experiments containing non-Clifford gates, is something I want to learn from their future experiments. Let's wait and see until their neutral atom platform can do repeated SM (or mid-circuit measurement).

OK,your detailed reply has resolved all my questions.Thanks a lot ! And overlapping decoding is necessary for improvement of decoding accuracy beyond the non-overlapping strategy.

I set the $p=1.5\times 10^{-3}$ and using BP on all but the last window (only contains this noiseless SM round) and final noiseless SM round i use the BP+OSD decoder to decode and the logical failure rate (per code cycle) stabilizes as the number of code cycles increases for both the HGP codes.I agree with you that np.logical_or(flagged_err, logical_err).astype(int).sum()) is of course an underestimation of LER. Because we must consider the logical error led by noisy syndrome measurement feedback. HGP_Number_of_cycles

gongaa / SlidingWindowDecoder

Problem about the Sliding Window OSD file #1