A question about bpp calculation

Cyprus-hy commented 3 years ago

I am sorry to bother you again, but I have a question about bpp calculation. I construct a toy network, which try to compress a float data that is randomly generated. After training, I do a inference also on the train data. What makes me confused is that the actual bpp(calculated by the length of compressed string) is much larger than the theoretical bpp(calculated by likelihoods). But shouldn't the actual bpp be almost the same as the theoretical bpp? Do I leave out something? Hope for your reply, thanks a lot. The following is code, I first randomly generate a data, whose size is batch*channel(10*64), and the input size of entropy bottleneck is 10*16:

from compressai.models import CompressionModel
import torch
from torch.nn import Linear
import torch.nn as nn
import torch.optim as optim
from torch.nn import MSELoss
import math

class Network(CompressionModel):
    def __init__(self):
        super().__init__(entropy_bottleneck_channels=16)
        self.encoder = nn.Sequential(
            Linear(64,32),
            Linear(32, 16)
        )

        self.decoder = nn.Sequential(
            Linear(16, 32),
            Linear(32, 64)
        )

    def forward(self,x):
        y = self.encoder(x)
        y_hat, y_likelihoods = self.entropy_bottleneck(y)
        x_hat = self.decoder(y_hat)

        return x_hat, y_likelihoods

# mse loss
mloss = MSELoss()

# Data
torch.manual_seed(10)
data = torch.rand(10,64).float().cuda()

# Model
model = Network().cuda()
# optimizer
parameters = set(p for n, p in model.named_parameters() if not n.endswith(".quantiles"))
aux_parameters = set(p for n, p in model.named_parameters() if n.endswith(".quantiles"))
optimizer = optim.Adam(parameters, lr=1e-4)
aux_optimizer = optim.Adam(aux_parameters, lr=1e-3)

# train
for i in range(1, 10001):
    optimizer.zero_grad()
    aux_optimizer.zero_grad()

    x_hat, y_likelihoods = model(data)

    mse_loss = mloss(x_hat, data)
    B, C = data.size()
    bpp_loss = torch.log(y_likelihoods).sum() / (-math.log(2) * B )
    distortion_loss = mse_loss + 1e-2 * bpp_loss
    distortion_loss.backward()
    optimizer.step()

    aux_loss = model.aux_loss()
    aux_loss.backward()
    aux_optimizer.step()

    if i %1000 == 0:
        print("iteration:", i)
        print("distortion loss:", distortion_loss.item())
        print('bpp_loss:',bpp_loss.item())
        print('mse_loss:',mse_loss.item())
        print('aux_loss:',aux_loss.item())

torch.save(model.state_dict(),'./model.pth')

# # load
model2 = Network().cuda()
ckpt = torch.load('./model.pth')
model2.load_state_dict(ckpt)
model2.update()

# theoretical bpp
_, y_likelihoods = model2.forward(data)
print('theoretical bpp:', (torch.log(y_likelihoods).sum() / (-math.log(2) * 10 )).item())

# actual bpp
x = model2.encoder(data)
string = model2.entropy_bottleneck.compress(x)
bpp = sum(len(s) for s in string) * 8.0 / 10
print('actual bpp:', bpp)

And here the theoretical bpp is 4.49, but the actual bpp is 64.

fracape commented 3 years ago

Hi, thanks for reporting, that is strange indeed. You train and perform inference on the same data, so you should learn your batch with your network and get a very small, and matching, theoretical and actual bitrate. Have you checked that your aux_loss converges with the learning rate you chose? You might also face a limitation on how the channels are processed by the entropy bottleneck (and ANS) since you encode a rather small vector. 64 (8 bytes) seems to show a limit in minimum number of bytes. Please let me know if you have investigated further.

Cyprus-hy commented 3 years ago

Hi, thanks for reporting, that is strange indeed. You train and perform inference on the same data, so you should learn your batch with your network and get a very small, and matching, theoretical and actual bitrate. Have you checked that your aux_loss converges with the learning rate you chose? You might also face a limitation on how the channels are processed by the entropy bottleneck (and ANS) since you encode a rather small vector. 64 (8 bytes) seems to show a limit in minimum number of bytes. Please let me know if you have investigated further.

Thanks for your reply. I have checked that aux_loss and distortion_loss both almost converge. And I also try to increase the vertor's size, from 10*64 to 10*1000000(1000000 is about the num_pixels of a normal image, and the channel of entropy bottleneck is still 16), but the actual bpp is still much larger than theoretical bpp. Besides, I compare the actual bpp and the theoretical bpp in the comparison notebook, they are indeed almost the same, just as the issue 12 shows. Until now I have no idea about this.

ZhangYuef commented 3 years ago

I am no sure why this issue happens but I think it's a must to add model2.eval() before model forward:

# theoretical bpp
_, y_likelihoods = model2.forward(data)

Cyprus-hy commented 3 years ago

I am no sure why this issue happens but I think it's a must to add model2.eval() before model forward:
# theoretical bpp
_, y_likelihoods = model2.forward(data)

Thanks for your reply. I have added "model2.eval()" according to your suggestion, but it still doesn't work.

ZhangYuef commented 3 years ago

I am no sure why this issue happens but I think it's a must to add model2.eval() before model forward:
# theoretical bpp
_, y_likelihoods = model2.forward(data)
Thanks for your reply. I have added "model2.eval()" according to your suggestion, but it still doesn't work.

How about add with torch.no_grad() and model2.eval() also? Just gussing

jbegaint commented 3 years ago

Closing stale issue. If you think it should remain open, feel free to reopen it.

InterDigitalInc / CompressAI

A question about bpp calculation #52