InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.21k stars 232 forks source link

About the decoding speed of the ELIC model #312

Open YunuoChen opened 1 month ago

YunuoChen commented 1 month ago

Thank you so much for your reimplementation of ELIC. However, in ELIC's original paper, they reported that the decoding latency of ELIC is much less than 100ms. But when I test with ELIC from CompressAI, the latency is about 130ms. May I ask why there is a speed gap? Looking forward to your reply.

YodaEmbedding commented 1 month ago

The measured decoding latency may depend on the CPU, GPU, or other conditions.

Their setup is mentioned here:

Supplementary Material

2. Detailed experimental settings

We implement, train, and evaluate all learning-based models on PyTorch 1.8.1. We use NVIDIA TITANXP to test both RD performance and inference speed. To test the speeds, we reproduce previously proposed models and evaluate them under the same running conditions for fair comparison. Since most of the models adopt reparameterization techniques, we fix the reparameterized weights before testing the speed. We follow a common protocol to test the latency with GPU synchronization. When testing each model, we drop the latency results (we do not drop them when evaluating RD performance) of the first 6 images to get rid of the influence of device warm-up, and average the running time of remained images to get the precious speed results.

We do not enable the deterministic inference mode (e.g. torch.backends.cudnn.deterministic) when testing the model speeds for two reasons. First, we tend to believe that the deterministic issue can be well solved with engineering efforts, such as using integer-only inference. Thus, the deterministic floating-point inference is unnecessary. Second, the deterministic mode extremely slows down the speed of specific operators, like transposed convolutions which are adopted by ELIC and earlier baseline models (Balle et al. and Minnen et al.), making the comparison somewhat unfair.

arXiv:2203.10886v2 [cs.CV] 29 Mar 2022

Relative measurements may be more worthwhile on different machines. Looking at their chart, it looks like:

YodaEmbedding commented 1 week ago

[!NOTE] The above comment still applies.

I took another look at the paper, and it says:

The architecture of $g_{\text{ch}}$ network is frankly sketched from Minnen et al. (2020). [...] The parameter aggregation network linearly reduces the dimensions to $2 M^{k}$ .

Our current implementation uses a sequential_channel_ramp for $g_{\text{ch}}$. But the paper description seems to suggest:

        # In [He2022], this is labeled "g_ch^(k)".
        channel_context = {
            f"y{k}": nn.Sequential(
                conv(sum(self.groups[:k]), 224, kernel_size=5, stride=1),
                nn.ReLU(inplace=True),
                conv(224, 128, kernel_size=5, stride=1),
                nn.ReLU(inplace=True),
                conv(128, self.groups[k] * 2, kernel_size=5, stride=1),
            )
            for k in range(1, len(self.groups))
        }

However, our current param_aggregation seems to match the paper:

        # In [He2022], this is labeled "Param Aggregation".
        param_aggregation = [
            sequential_channel_ramp(
                # Input: spatial context, channel context, and hyper params.
                self.groups[k] * 2 + (k > 0) * self.groups[k] * 2 + N * 2,
                self.groups[k] * 2,
                min_ch=N * 2,
                num_layers=3,
                interp="linear",
                make_layer=nn.Conv2d,
                make_act=lambda: nn.ReLU(inplace=True),
                kernel_size=1,
                stride=1,
                padding=0,
            )
            for k in range(len(self.groups))
        ]