InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.15k stars 228 forks source link

Does the number of bits estimated by the neural network same with that obtained by the actrual entropy encoder? #259

Open aprilbian opened 10 months ago

aprilbian commented 10 months ago

As stated, in the evaluation phase, when calling the forward function, the neural network gives you an estimated bit per pixel (bpp) value. I wish to know if this value is same or very close with that obtained from the actual arithematic encoder?

Thanks in advance for your reply!

YodaEmbedding commented 10 months ago

The forward function gives (what the model thinks is) the best probability distribution for encoding a given symbol. If the probability distribution used for encoding is different from the one the model tells us, then the average rate cost (bpp) should be higher than if we had used exactly the distribution the model told us to use. (Unless of course, the model is wrong, which would be unfortunate, since the whole point of training is to make the model correct!)

Thus, the average rate cost for an ideal lossless entropy coder is the smallest when we use the model's predicted distribution. Any divergence from this (e.g., an imperfect lossless entropy coder which uses a slightly wrong distribution) should on average result in an increase in the rate cost.

The rANS entropy coder used by CompressAI is not a perfect entropy coder. It uses the wrong distribution due to several reasons:

In many cases, this often only leads to <0.1% increase in rates from the "ideal" rate. In some rare cases where the trained entropy model breaks the assumptions we make (e.g. the model outputs scales outside the range of our precomputed scales; or frequently outputs scales that are exactly between two adjacent precomputed scales), then you may get a much larger increase in rate. If you were to use a high-precision traditional arithmetic coder with exactly computed distributions, it would get even closer to the model's "ideal" estimated bpp loss; perhaps even up to only a <1 bit increase.

Lastly, note that the bpp loss measured via forward is only "100% accurate" if:

aprilbian commented 10 months ago

Lastly, note that the bpp loss measured via forward is only "100% accurate" if:

  • y_hat is quantized by rounding instead of adding uniform noise.
  • Probably some other condition I can't think of right now.

Thanks a lot for the timely reply! I understand that in the evaluation phase, the y_hat is indeed a quantized version instead of adding uniform noise, so the bpp produced by the forward function should match with that of the real compression cost.

In my case, I use compressai with version 1.2.0, the forward function produce a bpp around 4.5 (equivelent to 5000 bits) in the evaluation phase, but when I try to produce the exact output, it turns out that the saved file size for the byte string output occupies 1100 bytes (8000 bits). I run model.update(force = True) before I call the compress function. Could you suggest any possible reason for the mismatch? Thanks! The code is attached (the commented part is used to generate estimated bit cost).

image

BTW, if I am compression a 1000-length sequence, do you think the number of elements is large enough so that the entropy encoder can produce very close result as the forward function output? Moreover, the number of bits to compress the input by the entropy encoder is equal to the size of the saved file?

YodaEmbedding commented 10 months ago

Try the items mentioned in https://github.com/InterDigitalInc/CompressAI/issues/236#issuecomment-1594490483:

  • Minimize aux_loss to 0 using the code from https://github.com/InterDigitalInc/CompressAI/pull/231, then re-run update and eval. [Example]
  • Try printing out the ranges of the distributions or plotting them to make sure they are sane, i.e., not too wide; typically, three or so channels should have dynamic range of ~50, and the rest should be <5 dynamic range.

Do you think the number of elements is large enough so that the entropy encoder can produce very close result as the forward function output?

I think there's a minimum of 8 or 16 bytes due to rANS (or something like that). Other than that, it depends on the symbols and their distributions (e.g., if all symbols are the most probable value and their encoding distribution is the Dirac delta function, $\delta[x]$, then the bit cost should be 0), and also the "wrong distribution" reasons I mentioned above.

Moreover, the number of bits to compress the input by the entropy encoder is equal to the size of the saved file?

It should be, if the data was written using:

[y_strings, z_strings] = strings

for i, s in enumerate(y_strings):
    with open(f"{i}.y.bin", "wb") as f:
        f.write(s)

for i, s in enumerate(z_strings):
    with open(f"{i}.z.bin", "wb") as f:
        f.write(s)