Open ali9609 opened 8 months ago
For forward, during training, output both y_likelihoods
and z_likelihoods
:
def forward(self, x, training=None):
if training is None:
training = self.training
y = self.g_a(x)
# Adapted from FactorizedPrior:
y_infer_hat, y_infer_likelihoods = self.entropy_bottleneck_y(y.detach())
if training:
# Copied from MeanScaleHyperprior:
z = self.h_a(y)
z_hat, z_likelihoods = self.entropy_bottleneck(z)
gaussian_params = self.h_s(z_hat)
scales_hat, means_hat = gaussian_params.chunk(2, 1)
y_hat, y_likelihoods = self.gaussian_conditional(
y, scales_hat, means_hat
)
likelihoods = {
"y": y_likelihoods,
"z": z_likelihoods,
"y_infer": y_infer_likelihoods,
}
else:
y_hat = y_infer_hat
likelihoods = {
"y_infer": y_infer_likelihoods,
}
x_hat = self.g_s(y_hat)
if not training:
# Optionally avoid training g_s if training for only inference mode loss.
# This can be done for any other outputs of y_hat, too.
# In practice, it shouldn't really matter though.
# Another easy alternative is just to freeze g_s's weights.
#
# x_hat = x_hat.detach()
# Optional:
# x_hat = x_hat.clamp_(0, 1)
pass
return {
"x_hat": x_hat,
"likelihoods": likelihoods,
}
The compress/decompress can be adapted from FactorizedPrior
:
def compress(self, x):
y = self.g_a(x)
y_strings = self.entropy_bottleneck_y.compress(y)
return {"strings": [y_strings], "shape": y.size()[-2:]}
def decompress(self, strings, shape):
assert isinstance(strings, list) and len(strings) == 1
y_hat = self.entropy_bottleneck_y.decompress(strings[0], shape)
x_hat = self.g_s(y_hat).clamp_(0, 1)
return {"x_hat": x_hat}
Unless I'm misunderstanding something, the pretrained hyperprior weights should also work with this architecture. You can load those, then freeze all those weights, then train only entropy_bottleneck_y
.
Thank you very much for your response, if i follow the architectural setting provided by the author Ahuja et al., CVPR, 2023
, I will have different number of channels for y=64 channels
and z=8 channels
. This further imply that i would have to declare separate self.entropy_bottleneck2
for y
. This would mean that I cannot use the pretrained hyperprior weights.. since there is no entropy_bottleneck2
exists. Is there any workaround for this? how to incoporate this during training?
Ah yes. There are a few possible approaches:
entropy_bottleneck_y
.g_a
from receiving gradients), and add the likelihoods for entropy_bottleneck_y
to the loss. Discard y_infer_hat
. In this simple setup, it's equal to y_hat
anyways.I've updated the code above with another entropy_bottleneck_y
. (Effectively approach 2.)
Thank you very much for your response. I actually tried both of the approaches, just for having some comparison.
Approach 1:
Idea1: Freeze everything and train entropy_bottleneck_y
.
Setting: The codec was initially trained for 30 epochs
. After adding entropy_bottleneck_y
, I trained it again for 30 epochs
, since only single layer was trainable, it was actually quite fast to do it.
Results:
Orignal Hyperprior Codec: Bpp = 0.24
, PSNR = 32.89
After training of entropy_bottleneck_y
Bpp = 0.254
, PSNR = 32.24
and throwing away the Hyperprior:
Remarks: I think the results look reasonable, its obvious that factorized prior
is not bitrate efficient as compared to hyperprior
. I believe its not possible to achieve the exact same/improve RD tradeoff. Some degradation is possible.
Idea2: Use your given code and retrain the codec from scratch. The detach trick
make it possible to do so
Setting: The codec was trained for 30 epochs
. Note that the same alpha
was used for the experiment as above but it resulted in a different tradeoff.
Results:
Orignal Hyperprior Codec: Bpp = 0.24
, PSNR = 32.89
After adding the codec as you given above Bpp = 0.165
, PSNR = 31.6
Remarks: The same value of alpha
resulted in a completely different tradeoff. Probably need to increase the value of alpha
to see the exact/near tradeoff.
I think Idea1 is more useful to me as it gives much control and you can also use the pre-trained weights.
Please let me know if something looks unsual or there is a need for improvment.
Hi,
First of all thanks for the great work, it really simplies the maintainence and development of codecs, which are otherwise very hard to do so. I am actually trying to add another codec in to the compressAI library but I need some quick suggestions since it works a bit different than typical codecs.
So I have the following feature compression codec by
Ahuja et al., CVPR, 2023.
The training objective of
Ahuja et al., CVPR, 2023
remains the same as the orignal hyperprior architecture byBaelle et al., ICLR 2018
, specifically the rate term is given as follows.Rate=
$\mathbb{E}[log_{2}p(y | z)+log_{2}p(z)]
$According to the diagram of the codec of the paper given above , I should do the following using the implementation of the hyperprior from
compressai/models/google.py
.As you already know
self.gaussian_conditional
is dependant onscales_hat
during both training and inference. However, in the codec shown above, they somehow compute theself.gaussian_conditional
only during training for loss calculation but throw it away during inference. Is there a way I can tweak the code given above such that i can also do what they are proposing? Thank you very much