Closed burdges closed 3 years ago
IIRC the first issue you reported is a limitation of the code. It would require some large changes to the codec to support what you'd like to do.
In the paper author's example code it corresponds to a different encode function:
I currently don't have the time to work on this large a change, sorry.
No worries! Thank you for you insights! If I understand you, you believe the original paper could be adopted to allow more parity symbols than data symbols?
Assuming so, we'd likely implement that algorithm in Rust more naively, so that it's more readable for us and our auditors. We'd then explore your fast optimized code for ideas on better exploiting AVX2, etc.
I'll close this issue for now since it goes beyond the scope of this library.
Yes it's expected to work when coded up. The paper does describe an algorithm for this as well.
Along with some superficial issues, it appears his encodeL
does not implement Algorithm 1 in https://github.com/catid/leopard/blob/master/docs/LowRateDecoder.pdf and it does not work right now. In fact, encodeL
looks quite close to Algorithm 1 from https://arxiv.org/pdf/1404.3458.pdf, although the difference might be rather minimal if the FFT absorbed or replaced these twisting coefficients.
It's likely @SianJhengLin implemented three versions for himself when writing these three papers, and encodeL
is simply the remnants, but not what he expected people to be most interested in.
I'll try to better understand the algorithm in the near future. And especially figure out if these different version even use the same decoder. Thanks again!
Ah could be broken I never tested that code path
We actually noticed a plausible high rate vs low rate difference in computing error locator polynomials in the decoder too, but only while reconstructing bits from first principles, so not necessarily relevant. We think low rate benefits from using the formal derivative again basically.
I suspect the division should be much lower than 50% for the high rate trick there, so maybe the high rate version would only rarely be optimal, but probably the actual code uses some more fair/uniform trick.
We actually care relatively little about performance for the error locator polynomial since we'll reuse the same error locator thousands of times, ala 10mb block size but under 400 bytes output per decoder run.
Algorithm 1 line 6 on page 4 of https://github.com/catid/leopard/blob/master/docs/NovelPolynomialBasisFFT2016.pdf has the skew factor dependent upon i,j,beta, while the reseaRch implementation has the index into skewVec depend only upon r and beta aka their j and index in https://github.com/SianJhengLin/Fast-algorithms-of-Reed-Solomon-erasure-codes/blob/master/RSErasureCode.c#L73-L76
As one iterates through recursion layers we access far fewer elements of skewVec than the paper indicates, so presumably the code does not perform the FFT as described. As best I can tell your code diverges from the paper similarly.
I still have not figured this out, but at least (19) justifies that something strange happens here.
We've run into some hiccups making this work with more parity symbols than data symbols.
There is a check leopard.cpp:135 to prevent this and if removed it segfaults
due to benchmark/test code assuming all lost symbols come from the data symbols at benchmark.cpp:452. I suppose test cases should cover loosing only some of the original data, but it'll presumably be straightforward to (randomly?) replace less than losses data symbols with parity symbols though.
We also run into some strange slowdown in
leo_encode
when adjusting the code to produce more parity symbols than data symbols. At first blush, there is seemingly just some runaway code that fills up many gigabytes of memory, which sounds strange since allocations seemingly occur outside leo_encode.I've not yet fully understood the algorithm so I'd love your input on whether you think either the encode or decode algorithm should run into any performance or other hiccups when handling more parity chunks than data chunks. Thoughts?