catid / leopard

Leopard-RS : O(N Log N) MDS Reed-Solomon Block Erasure Code for Large Data
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Allow more parity symbols aka losses than data #16

Closed burdges closed 3 years ago

burdges commented 3 years ago

We've run into some hiccups making this work with more parity symbols than data symbols.

There is a check leopard.cpp:135 to prevent this and if removed it segfaults

#0  leopard::SIMDSafeFree (ptr=0xf8913455fd9fcc4a) at /tmp/leopard/tests/../LeopardCommon.h:480
#1  Benchmark (params=...) at /tmp/leopard/tests/benchmark.cpp:452

due to benchmark/test code assuming all lost symbols come from the data symbols at benchmark.cpp:452. I suppose test cases should cover loosing only some of the original data, but it'll presumably be straightforward to (randomly?) replace less than losses data symbols with parity symbols though.

We also run into some strange slowdown in leo_encode when adjusting the code to produce more parity symbols than data symbols. At first blush, there is seemingly just some runaway code that fills up many gigabytes of memory, which sounds strange since allocations seemingly occur outside leo_encode.

I've not yet fully understood the algorithm so I'd love your input on whether you think either the encode or decode algorithm should run into any performance or other hiccups when handling more parity chunks than data chunks. Thoughts?

catid commented 3 years ago

IIRC the first issue you reported is a limitation of the code. It would require some large changes to the codec to support what you'd like to do.

catid commented 3 years ago

In the paper author's example code it corresponds to a different encode function:

https://github.com/SianJhengLin/Fast-algorithms-of-Reed-Solomon-erasure-codes/blob/674a839c1feb06f5afbd10c1770443954bb62849/RSErasureCode.c#L159

catid commented 3 years ago

I currently don't have the time to work on this large a change, sorry.

burdges commented 3 years ago

No worries! Thank you for you insights! If I understand you, you believe the original paper could be adopted to allow more parity symbols than data symbols?

Assuming so, we'd likely implement that algorithm in Rust more naively, so that it's more readable for us and our auditors. We'd then explore your fast optimized code for ideas on better exploiting AVX2, etc.

I'll close this issue for now since it goes beyond the scope of this library.

catid commented 3 years ago

Yes it's expected to work when coded up. The paper does describe an algorithm for this as well.

burdges commented 3 years ago

Along with some superficial issues, it appears his encodeL does not implement Algorithm 1 in https://github.com/catid/leopard/blob/master/docs/LowRateDecoder.pdf and it does not work right now. In fact, encodeL looks quite close to Algorithm 1 from https://arxiv.org/pdf/1404.3458.pdf, although the difference might be rather minimal if the FFT absorbed or replaced these twisting coefficients.

It's likely @SianJhengLin implemented three versions for himself when writing these three papers, and encodeL is simply the remnants, but not what he expected people to be most interested in.

I'll try to better understand the algorithm in the near future. And especially figure out if these different version even use the same decoder. Thanks again!

cc https://github.com/catid/leopard/issues/14

catid commented 3 years ago

Ah could be broken I never tested that code path

burdges commented 3 years ago

We actually noticed a plausible high rate vs low rate difference in computing error locator polynomials in the decoder too, but only while reconstructing bits from first principles, so not necessarily relevant. We think low rate benefits from using the formal derivative again basically.

I suspect the division should be much lower than 50% for the high rate trick there, so maybe the high rate version would only rarely be optimal, but probably the actual code uses some more fair/uniform trick.

We actually care relatively little about performance for the error locator polynomial since we'll reuse the same error locator thousands of times, ala 10mb block size but under 400 bytes output per decoder run.

burdges commented 3 years ago

Algorithm 1 line 6 on page 4 of https://github.com/catid/leopard/blob/master/docs/NovelPolynomialBasisFFT2016.pdf has the skew factor dependent upon i,j,beta, while the reseaRch implementation has the index into skewVec depend only upon r and beta aka their j and index in https://github.com/SianJhengLin/Fast-algorithms-of-Reed-Solomon-erasure-codes/blob/master/RSErasureCode.c#L73-L76

As one iterates through recursion layers we access far fewer elements of skewVec than the paper indicates, so presumably the code does not perform the FFT as described. As best I can tell your code diverges from the paper similarly.

I still have not figured this out, but at least (19) justifies that something strange happens here.