Closed Parvfect closed 7 months ago
To test, likelihood functions change to see if it affects the problem
Possible Asymmetry in Channel??
Seems like for the uncoded case for DCC, wherein I select the most likely symbol - for a read length of 20-40, the Zero CW performs significantly better than the full codeword. Since we are not decoding at all, this would suggest that the channel is biased to give more 0 symbols than any other. This is extremely bizarre.
Distracted Coupon Collector Uncoded Zero Codeword vs Full Codeword
Coupon Collector Uncoded Zero Codeword vs Full Codeword
But the weird thing is that there is a difference in performance for the zero vs normal codeword case in Coupon Collector
Which Suggests that the Decoder is Biased towards decoding the zero codeword
Similarly with a Decoder for DCC we have a difference in performance
Intuition suggests, CC channel is not biased towards zero - decoder is (to be tested with masking) DCC channel is biased towards zero and so is the decoder (to be tested with masking)
But the inconsistency with respect to results is becoming annoying. DCC channel seems biased, but decoder does not and vice versa for CC
So DCC is definitely biased, especially as read lengths get higher
Upon masking for the CC Decoder the imbalance goes away
Masked CC Decoder Performance
Masked DCC
So upon Masking the difference goes away. It may still have a slight difference but as we can see in the FER, it's random. That's 500 iterations
FERS [0.802, 0.646, 0.476, 0.382, 0.324, 0.236] [0.8, 0.63, 0.436, 0.364, 0.268, 0.274] Very minor differences for those many iterations, so masking solves our problem at a minimum
Roman has a hunch that the symbol space and coding scheme is biased, channel is fine. To ratify let us see what happens when we have a symmetric case of 5C4. He also suggests that uncoded dcc should be the same
5C4 comparision (200 iterations) change is negligible
[0.945, 0.795, 0.52, 0.29, 0.185, 0.085] [0.975, 0.765, 0.52, 0.305, 0.205, 0.105]
*Uncoded DCC (500 iterations)
[0.996, 0.98, 0.968, 0.946, 0.898, 0.868, 0.798, 0.772, 0.676, 0.648, 0.558, 0.494] [0.988, 0.978, 0.97, 0.93, 0.892, 0.856, 0.812, 0.768, 0.684, 0.59, 0.588, 0.486]
Seems like Roman's calculation is right
Uncoded DCC needs to be fixed
Add the changing C for coupon collector in decoding errors_fer
DCC uncoded - was popping wrong symbols -1,-2,3 pop is in place
Figure
Distracted Coupon Collector Channel - Graph QSPA
8C4 No Masking
Coding
FERS Zero - [1.0, 0.965, 0.755, 0.37, 0.145, 0.055, 0.005, 0.0] CW - [1.0, 0.9, 0.745, 0.39, 0.23, 0.125, 0.07, 0.05] 200 iterations - 6.95% average difference
No Coding
FERS Zero - [0.968, 0.934, 0.908, 0.858, 0.8, 0.742, 0.692, 0.61, 0.51, 0.456, 0.378, 0.414, 0.322, 0.25, 0.254] CW - [0.946, 0.944, 0.892, 0.86, 0.8, 0.72, 0.688, 0.578, 0.544, 0.492, 0.408, 0.314, 0.332, 0.288, 0.268] 500 iterations - 3.83 % average difference
Masking
coding Zero - [0.995, 0.885, 0.65, 0.375, 0.225, 0.09, 0.06, 0.06] CW - [0.995, 0.935, 0.67, 0.4, 0.18, 0.095, 0.085, 0.035]
5C4
Coding
5C2
No masking
Coding
No coding
Masking
coding
Coupon Collector Channel
8C4 No Masking
coding
no coding
Masking
Coding
5C4
Unmasked
Coding
No Coding
Masked
Seems like as established, there is a definite difference in performance for zero and full CW for the Coupon Collector Decoder (as shown in examples by Roman). This is also supported by the fact that this difference goes away in case of a symmetric system (5C4).
However, for the DCC, there does not seem to be a significant differnce as compared to CC, and I am not quite sure why. The difference between the two is negligible when comparing 8C4 and 5C4, wherein in CC, there is a massive difference. This does not make sense since if the CC decoder is biased, the DCC definitely should be as well.
Lastly, for moving forward, we can be sure that masking works, and retains the same performance no matter which system we operate in. So the work to be done is to compute for both CC and DCC the FER curve using Masking. I am going to try to DCC first, since it is the bigger bottleneck and then follow it up with CC. This is being done in #52.
Definite difference in performance for zero codeword vs the normal case, not quite sure why