InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.21k stars 232 forks source link

Adding new codec into CompressAI #277

Open ali9609 opened 8 months ago

ali9609 commented 8 months ago

Hi,

First of all thanks for the great work, it really simplies the maintainence and development of codecs, which are otherwise very hard to do so. I am actually trying to add another codec in to the compressAI library but I need some quick suggestions since it works a bit different than typical codecs.

So I have the following feature compression codec by Ahuja et al., CVPR, 2023.

image

The training objective of Ahuja et al., CVPR, 2023 remains the same as the orignal hyperprior architecture by Baelle et al., ICLR 2018, specifically the rate term is given as follows.

Rate=$\mathbb{E}[log_{2}p(y | z)+log_{2}p(z)]$

According to the diagram of the codec of the paper given above , I should do the following using the implementation of the hyperprior from compressai/models/google.py.

        y = self.g_a(x)
        z = self.h_a(torch.abs(y))
        z_hat, z_likelihoods = self.entropy_bottleneck(z) #Compute $log_{2}p(z)]$
        scales_hat = self.h_s(z_hat)
        y_hat, y_likelihoods = self.gaussian_conditional(y, scales_hat) #Compute $log_{2}p(y|z)$
        x_hat = self.g_s(y_hat)

As you already know self.gaussian_conditional is dependant on scales_hat during both training and inference. However, in the codec shown above, they somehow compute the self.gaussian_conditional only during training for loss calculation but throw it away during inference. Is there a way I can tweak the code given above such that i can also do what they are proposing? Thank you very much

YodaEmbedding commented 8 months ago

For forward, during training, output both y_likelihoods and z_likelihoods:

    def forward(self, x, training=None):
        if training is None:
            training = self.training

        y = self.g_a(x)

        # Adapted from FactorizedPrior:
        y_infer_hat, y_infer_likelihoods = self.entropy_bottleneck_y(y.detach())

        if training:
            # Copied from MeanScaleHyperprior:
            z = self.h_a(y)
            z_hat, z_likelihoods = self.entropy_bottleneck(z)
            gaussian_params = self.h_s(z_hat)
            scales_hat, means_hat = gaussian_params.chunk(2, 1)
            y_hat, y_likelihoods = self.gaussian_conditional(
                y, scales_hat, means_hat
            )
            likelihoods = {
               "y": y_likelihoods,
               "z": z_likelihoods,
               "y_infer": y_infer_likelihoods,
            }
        else:
           y_hat = y_infer_hat
           likelihoods = {
               "y_infer": y_infer_likelihoods,
           }

        x_hat = self.g_s(y_hat)

        if not training:
            # Optionally avoid training g_s if training for only inference mode loss.
            # This can be done for any other outputs of y_hat, too.
            # In practice, it shouldn't really matter though.
            # Another easy alternative is just to freeze g_s's weights.
            #
            # x_hat = x_hat.detach()

            # Optional:
            # x_hat = x_hat.clamp_(0, 1)

            pass

        return {
            "x_hat": x_hat,
            "likelihoods": likelihoods,
        }

The compress/decompress can be adapted from FactorizedPrior:

    def compress(self, x):
        y = self.g_a(x)
        y_strings = self.entropy_bottleneck_y.compress(y)
        return {"strings": [y_strings], "shape": y.size()[-2:]}

    def decompress(self, strings, shape):
        assert isinstance(strings, list) and len(strings) == 1
        y_hat = self.entropy_bottleneck_y.decompress(strings[0], shape)
        x_hat = self.g_s(y_hat).clamp_(0, 1)
        return {"x_hat": x_hat}

Unless I'm misunderstanding something, the pretrained hyperprior weights should also work with this architecture. You can load those, then freeze all those weights, then train only entropy_bottleneck_y.

ali9609 commented 8 months ago

Thank you very much for your response, if i follow the architectural setting provided by the author Ahuja et al., CVPR, 2023, I will have different number of channels for y=64 channels and z=8 channels. This further imply that i would have to declare separate self.entropy_bottleneck2 for y. This would mean that I cannot use the pretrained hyperprior weights.. since there is no entropy_bottleneck2 exists. Is there any workaround for this? how to incoporate this during training?

YodaEmbedding commented 8 months ago

Ah yes. There are a few possible approaches:

  1. Load model that is pretrained with hyperprior rate loss. Then, freeze its weights and only train entropy_bottleneck_y.
  2. Detach y (prevents g_a from receiving gradients), and add the likelihoods for entropy_bottleneck_y to the loss. Discard y_infer_hat. In this simple setup, it's equal to y_hat anyways.

I've updated the code above with another entropy_bottleneck_y. (Effectively approach 2.)

ali9609 commented 8 months ago

Thank you very much for your response. I actually tried both of the approaches, just for having some comparison.

Approach 1:

Idea1: Freeze everything and train entropy_bottleneck_y.

Setting: The codec was initially trained for 30 epochs. After adding entropy_bottleneck_y, I trained it again for 30 epochs, since only single layer was trainable, it was actually quite fast to do it.

Results:

Remarks: I think the results look reasonable, its obvious that factorized prior is not bitrate efficient as compared to hyperprior. I believe its not possible to achieve the exact same/improve RD tradeoff. Some degradation is possible.

Idea2: Use your given code and retrain the codec from scratch. The detach trick make it possible to do so

Setting: The codec was trained for 30 epochs. Note that the same alpha was used for the experiment as above but it resulted in a different tradeoff.

Results:

Remarks: The same value of alpha resulted in a completely different tradeoff. Probably need to increase the value of alpha to see the exact/near tradeoff.

I think Idea1 is more useful to me as it gives much control and you can also use the pre-trained weights.

Please let me know if something looks unsual or there is a need for improvment.