Difference in Performance for Zero Codeword

Parvfect commented 7 months ago

Definite difference in performance for zero codeword vs the normal case, not quite sure why

Parvfect commented 7 months ago

To test, likelihood functions change to see if it affects the problem

Parvfect commented 7 months ago

Possible Asymmetry in Channel??

[ ] Psuedo Random Mask to test if it still prefers the zero cw through disguise (for both)
[ ] Uncoded best guess performance for both

Parvfect commented 7 months ago

Seems like for the uncoded case for DCC, wherein I select the most likely symbol - for a read length of 20-40, the Zero CW performs significantly better than the full codeword. Since we are not decoding at all, this would suggest that the channel is biased to give more 0 symbols than any other. This is extremely bizarre.

Distracted Coupon Collector Uncoded Zero Codeword vs Full Codeword

Coupon Collector Uncoded Zero Codeword vs Full Codeword

But the weird thing is that there is a difference in performance for the zero vs normal codeword case in Coupon Collector

Which Suggests that the Decoder is Biased towards decoding the zero codeword

Similarly with a Decoder for DCC we have a difference in performance

Intuition suggests, CC channel is not biased towards zero - decoder is (to be tested with masking) DCC channel is biased towards zero and so is the decoder (to be tested with masking)

But the inconsistency with respect to results is becoming annoying. DCC channel seems biased, but decoder does not and vice versa for CC

Parvfect commented 7 months ago

So DCC is definitely biased, especially as read lengths get higher

Parvfect commented 7 months ago

Upon masking for the CC Decoder the imbalance goes away

Masked CC Decoder Performance

Masked DCC

So upon Masking the difference goes away. It may still have a slight difference but as we can see in the FER, it's random. That's 500 iterations

FERS [0.802, 0.646, 0.476, 0.382, 0.324, 0.236] [0.8, 0.63, 0.436, 0.364, 0.268, 0.274] Very minor differences for those many iterations, so masking solves our problem at a minimum

Parvfect commented 7 months ago

Roman has a hunch that the symbol space and coding scheme is biased, channel is fine. To ratify let us see what happens when we have a symmetric case of 5C4. He also suggests that uncoded dcc should be the same

5C4 comparision (200 iterations) change is negligible

[0.945, 0.795, 0.52, 0.29, 0.185, 0.085] [0.975, 0.765, 0.52, 0.305, 0.205, 0.105]

*Uncoded DCC (500 iterations)

[0.996, 0.98, 0.968, 0.946, 0.898, 0.868, 0.798, 0.772, 0.676, 0.648, 0.558, 0.494] [0.988, 0.978, 0.97, 0.93, 0.892, 0.856, 0.812, 0.768, 0.684, 0.59, 0.588, 0.486]

Seems like Roman's calculation is right

Parvfect commented 7 months ago

Uncoded DCC needs to be fixed

Parvfect commented 7 months ago

Add the changing C for coupon collector in decoding errors_fer

DCC uncoded - was popping wrong symbols -1,-2,3 pop is in place

Figure

Parvfect commented 7 months ago

Results Collection Pre Compute (Zero vs Full Codeword)

Distracted Coupon Collector Channel - Graph QSPA

8C4 No Masking

Coding

FERS Zero - [1.0, 0.965, 0.755, 0.37, 0.145, 0.055, 0.005, 0.0] CW - [1.0, 0.9, 0.745, 0.39, 0.23, 0.125, 0.07, 0.05] 200 iterations - 6.95% average difference
No Coding

FERS Zero - [0.968, 0.934, 0.908, 0.858, 0.8, 0.742, 0.692, 0.61, 0.51, 0.456, 0.378, 0.414, 0.322, 0.25, 0.254] CW - [0.946, 0.944, 0.892, 0.86, 0.8, 0.72, 0.688, 0.578, 0.544, 0.492, 0.408, 0.314, 0.332, 0.288, 0.268] 500 iterations - 3.83 % average difference

Masking

coding Zero - [0.995, 0.885, 0.65, 0.375, 0.225, 0.09, 0.06, 0.06] CW - [0.995, 0.935, 0.67, 0.4, 0.18, 0.095, 0.085, 0.035]

5C4

Coding

5C2

No masking

Coding
No coding

Masking

coding

Coupon Collector Channel

8C4 No Masking

coding
no coding

Masking

Coding

5C4

Unmasked

Coding
No Coding

Masked

Parvfect commented 7 months ago

Seems like as established, there is a definite difference in performance for zero and full CW for the Coupon Collector Decoder (as shown in examples by Roman). This is also supported by the fact that this difference goes away in case of a symmetric system (5C4).

However, for the DCC, there does not seem to be a significant differnce as compared to CC, and I am not quite sure why. The difference between the two is negligible when comparing 8C4 and 5C4, wherein in CC, there is a massive difference. This does not make sense since if the CC decoder is biased, the DCC definitely should be as well.

Lastly, for moving forward, we can be sure that masking works, and retains the same performance no matter which system we operate in. So the work to be done is to compute for both CC and DCC the FER curve using Masking. I am going to try to DCC first, since it is the bigger bottleneck and then follow it up with CC. This is being done in #52.

Parvfect / HelixWorks

Difference in Performance for Zero Codeword #50

Results Collection Pre Compute (Zero vs Full Codeword)