Questions about using entropybottleneck

James89045 commented 11 months ago

I want to use entropybottleneck to calculate pmf of weights of a trained model. the range of weights value are mostly between (-1, 1), and the number of parameters is about 12M. But results showed that the evaluated probability seemed very incorrect compared to ground truth probability. In my implementation, I use auxiliary loss to optimize entropybottleneck until the loss is down to 0.1, then I use the trained entropybottleneck to evaluate pmf of another model's weights. And the result is quite bad, I'm wondering that what parts in my implementation can be improved? Thank you very much!

YodaEmbedding commented 11 months ago

Just to confirm, do you mean that you're running the model's weights through the EntropyBottleneck?

Is this for the purpose of compressing an input model's weights? e.g., to compress a "ResNet" model:

weight_compression_model = WeightCompressionModel()
resnet_model = ResNet34()

weights = resnet_model.parameters()
out_net = weight_compression_model(weights)

If so, then note a few things:

Minimizing aux loss doesn't actually train the EntropyBottleneck's parameters which model a cdf distribution $c(y)$ that, on average, tries to fit the data $y$. See:
- https://interdigitalinc.github.io/CompressAI/models.html#compressai.models.CompressionModel.aux_loss
- https://github.com/InterDigitalInc/CompressAI/issues/167#issuecomment-1292682384
Weights that are in $[-1, 1]$ are going to be quantized to $\{-1, 0, 1\}$. You may want to multiply your input weights by some (trainable) gain vector to rescale them so that there are more quantization bins to work with. Actually, for good compression, you may want to use a full-fledged trainable network for weight compression, rather than just a single entropy bottleneck.
The entropy bottleneck works under the assumption that the contents of each channel are i.i.d. This means you should preferably put statistically-similar weight R.V.s inside a single channel together.

If the goal is instead to just find a pmf for visualization, then you may have more success just using torch.histogram or similar to bin the values rather than an EntropyBottleneck, which is more intended for compressing an input signal.

James89045 commented 11 months ago

Thanks for your reply!Actually I want to add rate term to opitimize my main network, the goal is to cost down bit rate of model's parameter .

My calculation involves analyzing each layer of the model, taking out the weights of each layer and inputting them into the entropy bottleneck to compute their pmf. After obtaining the pmf, I can calculate the entropy, and finally, summing up the entropies calculated for each layer gives me the complete rate term loss value. Additionally, the weights in each layer are treated as i.i.d samples. I use the entropy bottleneck simply to obtain a differentiable way of computing the pmf. However, since the values of weights range from (-1,1), they may differ from the scale expected by the code, leading to inaccurate probability estimation.

In this scenario, apart from rescaling them, what other adjustments can be made to improve the accuracy of the estimated pmf? I greatly appreciate any insights!

YodaEmbedding commented 11 months ago

Create an entropy bottleneck for each set of weights. To get a simple implementation working, I would just use a single channel EntropyBottleneck(1) for each. Later, you can increase this, or find better groupings of channels.

# A different entropy bottleneck for each set of params.
self.entropy_bottleneck = nn.ModuleDict({
    f"{module_name.replace(".", "_")}_{param_name.replace(".", "_")}": EntropyBottleneck(1)
    for module_name, module in inner_model.named_modules()
    for param_name, param in module.named_parameters()
    # or something like this
})

# A trainable gain vector for each set of weights.
# Needed to control the number of quantization bins.
self.weight_gain_vector = nn.ModuleDict({
    name: nn.Parameter(torch.ones(entropy_bottleneck.num_channels))
    for name, entropy_bottleneck in self.entropy_bottleneck.items()
    # Reshape the dimensions as needed.
})

Also, depending on your weights, they may have some covariance structure, so you may benefit from an entropy model (similar to e.g. hyperprior) that takes this into account. General lossless compressors (e.g. LZMA) should partly match the reported rate for this, since they will also do some sort of context adaptation to the incoming data stream.

James89045 commented 11 months ago

Thank you very much! I will try it and share the result!

InterDigitalInc / CompressAI

Questions about using entropybottleneck #266