Cyan4973 / FiniteStateEntropy

New generation entropy codecs : Finite State Entropy and Huff0
BSD 2-Clause "Simplified" License
1.33k stars 143 forks source link

a step in normalization of count #105

Open BeeBreeze opened 4 years ago

BeeBreeze commented 4 years ago

I am new here and still reading the source code. In this line, why normalizedCounter[s] = -1? In my opinion, -1 should be 1. Could you please explain it to me? Thanks a lot. https://github.com/Cyan4973/FiniteStateEntropy/blob/12a533a9bf4d7bdcc507bf9d11302a7a1be454f5/lib/fse_compress.c#L459

Cyan4973 commented 2 years ago

It's a special case, meaning "this symbol has a weight of 1, because it can't be lower than 1, but really, it's so small, it should be a fraction of that". This information has consequences on the way the table is built, because not all positions in the table are equivalent, therefore such symbols will be attributed the least probable positions.

This is pretty advanced stuff. It's not "necessary" to know it. You may also just as well provide "1" to these symbols, and it will work, they will just receive a "normal slot" which is going to negatively impact the global compression ratio by a very little amount, but no big deal.

JarekDuda commented 2 years ago

This is basic tuning, a year ago I have finally written paper about tuning: https://arxiv.org/pdf/2106.06438 For 2048 states and 256 size alphabet, ~100 byte header allows to work deltaH/H ~ 0.002 from Shannon.