Parvfect / HelixWorks

Code for the Channels and Decoding Methods
0 stars 0 forks source link

Difference in Performance for Zero Codeword #50

Closed Parvfect closed 3 months ago

Parvfect commented 3 months ago

Definite difference in performance for zero codeword vs the normal case, not quite sure why Image

Parvfect commented 3 months ago

To test, likelihood functions change to see if it affects the problem

Parvfect commented 3 months ago

Possible Asymmetry in Channel??

Parvfect commented 3 months ago

Seems like for the uncoded case for DCC, wherein I select the most likely symbol - for a read length of 20-40, the Zero CW performs significantly better than the full codeword. Since we are not decoding at all, this would suggest that the channel is biased to give more 0 symbols than any other. This is extremely bizarre.

Distracted Coupon Collector Uncoded Zero Codeword vs Full Codeword

image image

Coupon Collector Uncoded Zero Codeword vs Full Codeword

image

But the weird thing is that there is a difference in performance for the zero vs normal codeword case in Coupon Collector

image image

Which Suggests that the Decoder is Biased towards decoding the zero codeword

Similarly with a Decoder for DCC we have a difference in performance

Intuition suggests, CC channel is not biased towards zero - decoder is (to be tested with masking) DCC channel is biased towards zero and so is the decoder (to be tested with masking)

image

But the inconsistency with respect to results is becoming annoying. DCC channel seems biased, but decoder does not and vice versa for CC

Parvfect commented 3 months ago
image

So DCC is definitely biased, especially as read lengths get higher

Parvfect commented 3 months ago

Upon masking for the CC Decoder the imbalance goes away

Masked CC Decoder Performance

image

Masked DCC image

So upon Masking the difference goes away. It may still have a slight difference but as we can see in the FER, it's random. That's 500 iterations

FERS [0.802, 0.646, 0.476, 0.382, 0.324, 0.236] [0.8, 0.63, 0.436, 0.364, 0.268, 0.274] Very minor differences for those many iterations, so masking solves our problem at a minimum

Parvfect commented 3 months ago

Roman has a hunch that the symbol space and coding scheme is biased, channel is fine. To ratify let us see what happens when we have a symmetric case of 5C4. He also suggests that uncoded dcc should be the same

5C4 comparision (200 iterations) change is negligible image

[0.945, 0.795, 0.52, 0.29, 0.185, 0.085] [0.975, 0.765, 0.52, 0.305, 0.205, 0.105]

*Uncoded DCC (500 iterations) image

[0.996, 0.98, 0.968, 0.946, 0.898, 0.868, 0.798, 0.772, 0.676, 0.648, 0.558, 0.494] [0.988, 0.978, 0.97, 0.93, 0.892, 0.856, 0.812, 0.768, 0.684, 0.59, 0.588, 0.486]

Seems like Roman's calculation is right

Parvfect commented 3 months ago

Uncoded DCC needs to be fixed

Parvfect commented 3 months ago

Add the changing C for coupon collector in decoding errors_fer

DCC uncoded - was popping wrong symbols -1,-2,3 pop is in place

Figure

image

Parvfect commented 3 months ago

Results Collection Pre Compute (Zero vs Full Codeword)

Distracted Coupon Collector Channel - Graph QSPA

8C4 No Masking

  1. Coding

    FERS Zero - [1.0, 0.965, 0.755, 0.37, 0.145, 0.055, 0.005, 0.0] CW - [1.0, 0.9, 0.745, 0.39, 0.23, 0.125, 0.07, 0.05] 200 iterations - 6.95% average difference

    image
  2. No Coding

    FERS Zero - [0.968, 0.934, 0.908, 0.858, 0.8, 0.742, 0.692, 0.61, 0.51, 0.456, 0.378, 0.414, 0.322, 0.25, 0.254] CW - [0.946, 0.944, 0.892, 0.86, 0.8, 0.72, 0.688, 0.578, 0.544, 0.492, 0.408, 0.314, 0.332, 0.288, 0.268] 500 iterations - 3.83 % average difference

    image

Masking

  1. coding Zero - [0.995, 0.885, 0.65, 0.375, 0.225, 0.09, 0.06, 0.06] CW - [0.995, 0.935, 0.67, 0.4, 0.18, 0.095, 0.085, 0.035]

    image

5C4

  1. Coding

    image

    5C2

No masking

  1. Coding

    image
  2. No coding

    image

Masking

  1. coding

    image

Coupon Collector Channel

8C4 No Masking

  1. coding

    image
  2. no coding

    image

Masking

  1. Coding

    image

5C4

Unmasked

  1. Coding

    image
  2. No Coding

    image

Masked

image
Parvfect commented 3 months ago

Seems like as established, there is a definite difference in performance for zero and full CW for the Coupon Collector Decoder (as shown in examples by Roman). This is also supported by the fact that this difference goes away in case of a symmetric system (5C4).

However, for the DCC, there does not seem to be a significant differnce as compared to CC, and I am not quite sure why. The difference between the two is negligible when comparing 8C4 and 5C4, wherein in CC, there is a massive difference. This does not make sense since if the CC decoder is biased, the DCC definitely should be as well.

Lastly, for moving forward, we can be sure that masking works, and retains the same performance no matter which system we operate in. So the work to be done is to compute for both CC and DCC the FER curve using Masking. I am going to try to DCC first, since it is the bigger bottleneck and then follow it up with CC. This is being done in #52.