InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.21k stars 232 forks source link

Question about the gaussian_conditional model in Cheng2020 #316

Open achel-x opened 2 weeks ago

achel-x commented 2 weeks ago

When I run the Cheng2020 series, I noticed that the Cheng2020Anchor inherit from JointAutoregressiveHierarchicalPriors. While the `gaussian_conditional` model in `JointAutoregressiveHierarchicalPriors` is just GaussianConditional, it is not a gaussian mixture model.

I tried to add a sentence in Cheng2020Anchor

self.gaussian_conditional = GaussianMixtureConditional()

But it failed to run.

image

what does the weights mean? and what should i pass to the GaussianMixtureConditional()

chunbaobao commented 1 week ago

The Discretized Gaussian Mixture Likelihoods follows the equation in the paper: image In this equation, $\omega$ refers to the weights in the code: https://github.com/InterDigitalInc/CompressAI/blob/743680befc146a6d8ee7840285584f2ce00c3732/compressai/entropy_models/entropy_models.py#L735-L751 Usually, the parameters of the latent codec distribution, including the weights, are the outputs of some neural networks.
You can slightly modify the network's output to obtain the weights.

https://github.com/InterDigitalInc/CompressAI/blob/743680befc146a6d8ee7840285584f2ce00c3732/compressai/models/google.py#L534-L554

achel-x commented 4 days ago

Thanks for your kindly instruction.

I tried a modification with the issue in here.

https://github.com/InterDigitalInc/CompressAI/issues/289#issuecomment-2485716961

I have tried to make the same modification of using the GaussianMixtureConditional as showed below

` class Cheng2020GMM(Cheng2020Anchor):

def __init__(self, N=192, **kwargs):
    super().__init__(N=N, **kwargs)

    self.K = 3 # for GMM

    self.entropy_parameters = nn.Sequential(
        nn.Conv2d(N * 12 // 3, N * 10 // 3, 1),
        nn.LeakyReLU(inplace=True),
        nn.Conv2d(N * 10 // 3, N * 8 // 3, 1),
        nn.LeakyReLU(inplace=True),
        # nn.Conv2d(N * 8 // 3, N * 6 // 3, 1),
        nn.Conv2d(N * 8 // 3, N * 3 * self.K, 1),
    )

    self.gaussian_conditional = GaussianMixtureConditional(K=self.K)

def forward(self, x):
    y = self.g_a(x)
    z = self.h_a(y)
    z_hat, z_likelihoods = self.entropy_bottleneck(z)
    params = self.h_s(z_hat)

    y_hat = self.gaussian_conditional.quantize(
        y, "noise" if self.training else "dequantize"
    )
    ctx_params = self.context_prediction(y_hat)
    gaussian_params = self.entropy_parameters(
        torch.cat((params, ctx_params), dim=1)
    )
    # print(f"gaussian_params.shape is {gaussian_params.shape}") # [8, 1728, 16, 16]

    # scales_hat, means_hat = gaussian_params.chunk(2, 1)
    scales_hat, means_hat, weight_hat = gaussian_params.chunk(3, 1)
    B, C, H, W = weight_hat.shape   # C is M*K - M*3
    weight_hat = nn.functional.softmax(weight_hat.reshape(B, 3, C//3, H, W), dim=1).reshape(B, C, H, W)

    # _, y_likelihoods = self.gaussian_conditional(y, scales_hat, means=means_hat)
    y_hat1, y_likelihoods = self.gaussian_conditional(y, scales_hat, means_hat, weights=weight_hat)

    # x_hat = self.g_s(y_hat)
    x_hat = self.g_s(y_hat1)

    return {
        "x_hat": x_hat,
        "likelihoods": {"y": y_likelihoods, "z": z_likelihoods},
    }

`

I compared the Cheng2020GMM and Cheng2020Anchor. The results are confused

GMM image

Anchor image

The GMM is inferior to the anchor and I am unable to undertand it. If you have some insights here, please help me out at your convenience!

Thanks again for your valuable time.

Best wishes.

watwwwww commented 1 day ago

Hi! I have the same problem as well. Have you found a good solution yet?

YodaEmbedding commented 1 day ago

Perhaps try using STE for quantization instead of noise.

Still, it's weird that GMM K=3 performs that much worse than GC. Try setting K=1 and training. Is the performance still worse?