InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.15k stars 228 forks source link

Questions about using entropybottleneck #266

Open James89045 opened 8 months ago

James89045 commented 8 months ago

I want to use entropybottleneck to calculate pmf of weights of a trained model. the range of weights value are mostly between (-1, 1), and the number of parameters is about 12M. But results showed that the evaluated probability seemed very incorrect compared to ground truth probability. In my implementation, I use auxiliary loss to optimize entropybottleneck until the loss is down to 0.1, then I use the trained entropybottleneck to evaluate pmf of another model's weights. And the result is quite bad, I'm wondering that what parts in my implementation can be improved? Thank you very much!

YodaEmbedding commented 8 months ago

Just to confirm, do you mean that you're running the model's weights through the EntropyBottleneck?

Is this for the purpose of compressing an input model's weights? e.g., to compress a "ResNet" model:

weight_compression_model = WeightCompressionModel()
resnet_model = ResNet34()

weights = resnet_model.parameters()
out_net = weight_compression_model(weights)

If so, then note a few things:

If the goal is instead to just find a pmf for visualization, then you may have more success just using torch.histogram or similar to bin the values rather than an EntropyBottleneck, which is more intended for compressing an input signal.

James89045 commented 8 months ago

Thanks for your reply!Actually I want to add rate term to opitimize my main network, the goal is to cost down bit rate of model's parameter .

My calculation involves analyzing each layer of the model, taking out the weights of each layer and inputting them into the entropy bottleneck to compute their pmf. After obtaining the pmf, I can calculate the entropy, and finally, summing up the entropies calculated for each layer gives me the complete rate term loss value. Additionally, the weights in each layer are treated as i.i.d samples. I use the entropy bottleneck simply to obtain a differentiable way of computing the pmf. However, since the values of weights range from (-1,1), they may differ from the scale expected by the code, leading to inaccurate probability estimation.

In this scenario, apart from rescaling them, what other adjustments can be made to improve the accuracy of the estimated pmf? I greatly appreciate any insights!

YodaEmbedding commented 8 months ago

Create an entropy bottleneck for each set of weights. To get a simple implementation working, I would just use a single channel EntropyBottleneck(1) for each. Later, you can increase this, or find better groupings of channels.

# A different entropy bottleneck for each set of params.
self.entropy_bottleneck = nn.ModuleDict({
    f"{module_name.replace(".", "_")}_{param_name.replace(".", "_")}": EntropyBottleneck(1)
    for module_name, module in inner_model.named_modules()
    for param_name, param in module.named_parameters()
    # or something like this
})

# A trainable gain vector for each set of weights.
# Needed to control the number of quantization bins.
self.weight_gain_vector = nn.ModuleDict({
    name: nn.Parameter(torch.ones(entropy_bottleneck.num_channels))
    for name, entropy_bottleneck in self.entropy_bottleneck.items()
    # Reshape the dimensions as needed.
})

Also, depending on your weights, they may have some covariance structure, so you may benefit from an entropy model (similar to e.g. hyperprior) that takes this into account. General lossless compressors (e.g. LZMA) should partly match the reported rate for this, since they will also do some sort of context adaptation to the incoming data stream.

James89045 commented 8 months ago

Thank you very much! I will try it and share the result!