Parchive / par2cmdline

Official repo for par2cmdline and libpar2
http://parchive.sourceforge.net
GNU General Public License v2.0
723 stars 75 forks source link

A better error correcting code #166

Closed vapniks closed 2 years ago

vapniks commented 2 years ago

Just throwing this out there in case anyone's interested. New mathematical research finds better error correcting codes: https://www.quantamagazine.org/researchers-defeat-randomness-to-create-ideal-code-20211124/

mdnahas commented 2 years ago

Yeah, I saw that. I couldn't make heads or tails of it.

If you can explain it and its value to our users, I'm all ears.

vapniks commented 2 years ago

Here's the preprint: https://arxiv.org/pdf/2111.04808.pdf

mdnahas commented 2 years ago

@vapniks If you can explain it and its value to our users, I'm all ears.

vapniks commented 2 years ago

I don't know the details of the algorithm, but on page 2 of that paper it says it can produce, for any given code rate (proportion of data bits to total bits), good error correcting codes that are almost optimal wrt the Gilbert-Varshamov bound: https://en.wikipedia.org/wiki/Gilbert%E2%80%93Varshamov_bound So I guess this probably means using this algorithm you could squeeze more data onto the disk than a Reed-Solomon code with the same error-correcting ability. Sorry I can't offer any more help, but perhaps the authors of that paper might be able to assist if anyone is thinking of trying to implement it.

mdnahas commented 2 years ago

Reed-Solomon is optimal when it comes to error-correcting ability for a given block size. With K recovery blocks, you can always recover K or fewer errors.

And, in fact, for large K, a random code gets very very close to that bound. Turbocodes and some others are based around the idea that a random code gets very very close to that bound.

The Quanta article talks and the paper say the big difference with this code is that it is a locally testable code. I'm not sure exactly what that means. I think it means that you can look at a handful of bytes in a block and be pretty sure if it is good or bad. Or, it may mean that looking at a few bytes will tell you if the whole input set is recoverable. I'm not sure.

Thanks for letting us know about it. But I don't think we'll be including it in Parchive 3.0.