Question about the gaussian_conditional model in Cheng2020

achel-x commented 2 weeks ago

When I run the Cheng2020 series, I noticed that the Cheng2020Anchor inherit from JointAutoregressiveHierarchicalPriors. While the ｀gaussian_conditional｀ model in ｀JointAutoregressiveHierarchicalPriors｀ is just GaussianConditional, it is not a gaussian mixture model.

I tried to add a sentence in Cheng2020Anchor

self.gaussian_conditional = GaussianMixtureConditional()

Ｂut it failed to run．

what does the weights mean? and what should i pass to the GaussianMixtureConditional()

chunbaobao commented 1 week ago

The Discretized Gaussian Mixture Likelihoods follows the equation in the paper: In this equation, $\omega$ refers to the weights in the code: https://github.com/InterDigitalInc/CompressAI/blob/743680befc146a6d8ee7840285584f2ce00c3732/compressai/entropy_models/entropy_models.py#L735-L751 Usually, the parameters of the latent codec distribution, including the weights, are the outputs of some neural networks.
You can slightly modify the network's output to obtain the weights.

https://github.com/InterDigitalInc/CompressAI/blob/743680befc146a6d8ee7840285584f2ce00c3732/compressai/models/google.py#L534-L554

achel-x commented 4 days ago

Thanks for your kindly instruction.

I tried a modification with the issue in here.

https://github.com/InterDigitalInc/CompressAI/issues/289#issuecomment-2485716961

I have tried to make the same modification of using the GaussianMixtureConditional as showed below

` class Cheng2020GMM(Cheng2020Anchor):

def __init__(self, N=192, **kwargs):
    super().__init__(N=N, **kwargs)

    self.K = 3 # for GMM

    self.entropy_parameters = nn.Sequential(
        nn.Conv2d(N * 12 // 3, N * 10 // 3, 1),
        nn.LeakyReLU(inplace=True),
        nn.Conv2d(N * 10 // 3, N * 8 // 3, 1),
        nn.LeakyReLU(inplace=True),
        # nn.Conv2d(N * 8 // 3, N * 6 // 3, 1),
        nn.Conv2d(N * 8 // 3, N * 3 * self.K, 1),
    )

    self.gaussian_conditional = GaussianMixtureConditional(K=self.K)

def forward(self, x):
    y = self.g_a(x)
    z = self.h_a(y)
    z_hat, z_likelihoods = self.entropy_bottleneck(z)
    params = self.h_s(z_hat)

    y_hat = self.gaussian_conditional.quantize(
        y, "noise" if self.training else "dequantize"
    )
    ctx_params = self.context_prediction(y_hat)
    gaussian_params = self.entropy_parameters(
        torch.cat((params, ctx_params), dim=1)
    )
    # print(f"gaussian_params.shape is {gaussian_params.shape}") # [8, 1728, 16, 16]

    # scales_hat, means_hat = gaussian_params.chunk(2, 1)
    scales_hat, means_hat, weight_hat = gaussian_params.chunk(3, 1)
    B, C, H, W = weight_hat.shape   # C is M*K - M*3
    weight_hat = nn.functional.softmax(weight_hat.reshape(B, 3, C//3, H, W), dim=1).reshape(B, C, H, W)

    # _, y_likelihoods = self.gaussian_conditional(y, scales_hat, means=means_hat)
    y_hat1, y_likelihoods = self.gaussian_conditional(y, scales_hat, means_hat, weights=weight_hat)

    # x_hat = self.g_s(y_hat)
    x_hat = self.g_s(y_hat1)

    return {
        "x_hat": x_hat,
        "likelihoods": {"y": y_likelihoods, "z": z_likelihoods},
    }

`

I compared the Cheng2020GMM and Cheng2020Anchor. The results are confused

GMM