Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0
BSD 2-Clause "Simplified" License
1.33k stars 143 forks source link

FSE_emergencyDistrib does not exit under certain conditions #44

Closed tjizep closed 9 years ago

tjizep commented 9 years ago

when all values in the first maxSymbolValue positions of the normalizedCounter are <= 1 the function will not exit and an infinite loop will result

Cyan4973 commented 9 years ago

Hi Christiaan

Indeed, FSE_emergencyDistrib() is probably one of the least tested function, since it's a fallback of a fallback to handle rare corner cases, and as such may have some yet unnoticed weird cases to tackle. Sorry, I have little time to work on this right now, but I'll come back to look this issue in more details later on.

Rgds

Cyan4973 commented 9 years ago

I had some time to look into this issue.

While what you underline is technicaly correct (if all positions of normalizedCounter are <=1, infinite loop will result), this is a situation which cannot happen.

The reason is : FSE_emergencyDistrib() is a fallback of FSE_emergencyDistrib(), which is itself a fallback of FSE_normalizeCount().

The fallback is only triggered when the number of points to remove is too large to cope with. If there are some points to remove, it means the total number of points already distributed within normalizedCounter is larger than the authorized sum, which is 1 << tableLog.

So obviously, this should not happen, because otherwise, it means maxSymbolValue >= (1 << tableLog), which is a forbidden situation (impossible to compress). This condition is guaranteed by FSE_optimalTableLog().

It used to also be verified at the start of FSE_normalizeCount(), but the relevant code was removed when FSE_optimalTableLog() was externalized from it. I should probably add it again.

As a quick note : FSE_emergencyDistrib() is really a "quick & dirty" job. It's just supposed to work without crashing, but does little to optimize the compression ratio. Maybe it could, or should, be updated with a stronger version, producing better compression result, albeit at the cost of some speed. It would require to adapt a test framework to trigger and study relevant test case.

Cyan4973 commented 9 years ago

The condition regarding minimum size of tableLog given max symbol values has been restored within FSE_normalizeCount().

Fuzzer tests have been upgraded to check this condition too : https://github.com/Cyan4973/FiniteStateEntropy/tree/dev

Cyan4973 commented 9 years ago

There is a new strategy proposed within FSE for emergency normalization. It's available within the dev branch : https://github.com/Cyan4973/FiniteStateEntropy/tree/dev

The new strategy compresses better. It's also slower. But considering it is very rarely used, if ever, its overall speed (and compression) impact will be low.

One thing these tests reminded me is how hard it is to get into a situation where emergency distribution is required. It's actually very hard. One way to achieve that is to try to compress an input using a very small tableLog value. For example, trying to compress a full 256 values alphabet using only 9 bits of code. In such case, what will probably happen is that a lot of symbol will receive the minimum weight (1), while in fact being really too small to actually deserve so much. This will create a "scarcity" situation, where remaining symbols will have to cope with less weight than needed. If none of the symbol is really dominant, the usual method, which is to make the most important symbol cope for the weight difference, won't be appropriate. In this case, a more complex emergency distrib (now renamed "renormSlow") is required.

So, in nutshell, yes it can happen, but it's pretty rare, and require unreasonably small tablelog value to be triggered.

In any case, when such situation nonetheless happen, the new distribution strategy should cope with it, not only without crashing (as previous algorithm already did), but also by sacrificing less compression performance.

Cyan4973 commented 9 years ago

New version validated and included within master branch