biocore / deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
BSD 3-Clause "New" or "Revised" License
92 stars 41 forks source link

Implementation detail #161

Closed wwood closed 6 years ago

wwood commented 7 years ago

Hi,

I'm looking at using deblur for a related error correction problem, and I noticed that in the pseudocode provided in the supplementary of the paper it says:

For i in NReads :
...
# iterate through the remaining reads.
# Note: because we only loop over j > i, preceding counts 
# will not be reduced by future ones.
for j>i, j<=NReads:
...

But looking at the code it appears to be iterating over each j!=i, not j>i (from https://github.com/biocore/deblur/blob/master/deblur/deblurring.py#L141 ):

 for seq_i in seqs:
        ...
        for seq_j in seqs:
            # Ignore current sequence
            if seq_i == seq_j:
                continue
            ...

Am I misinterpreting the code or pseudocode? Has there been some update since submission of the article? I'm not entirely sure how much difference it makes in any case (2 closely sequences with high counts maybe?), but just thought I'd report it after seeing it.

Thanks, ben

amnona commented 7 years ago

Hi Ben, thanks for your comment. you are correct. There is a mistake is in the pseudocode. It should read:

Sort sequences in decreasing Counts[i] order
for i=1...NReads, for j=1...NReads:

(and this is the way it is implemented in deblur) (instead of for j>i, j<=NReads:)

The reason for iterating over all i,j pairs is that read error leakage is symmetrical (i.e. if Si contributes reads to Sj, Sj will also contribute reads to Si). Like you wrote, it should not make a big difference in most cases, but can affect the frequencies more in the case of two close (hamming-wise) and similar frequency sequences.

Thanks for spotting this error in the pseudocode. And let us know if you have more questions/comments or if you want to discuss more the error correction problem you are working on.

Amnon

wasade commented 6 years ago

Closing as it seems like the question has been answered.