Open YunuoChen opened 1 month ago
The measured decoding latency may depend on the CPU, GPU, or other conditions.
Their setup is mentioned here:
Supplementary Material
2. Detailed experimental settings
We implement, train, and evaluate all learning-based models on PyTorch 1.8.1. We use NVIDIA TITANXP to test both RD performance and inference speed. To test the speeds, we reproduce previously proposed models and evaluate them under the same running conditions for fair comparison. Since most of the models adopt reparameterization techniques, we fix the reparameterized weights before testing the speed. We follow a common protocol to test the latency with GPU synchronization. When testing each model, we drop the latency results (we do not drop them when evaluating RD performance) of the first 6 images to get rid of the influence of device warm-up, and average the running time of remained images to get the precious speed results.
We do not enable the deterministic inference mode (e.g.
torch.backends.cudnn.deterministic
) when testing the model speeds for two reasons. First, we tend to believe that the deterministic issue can be well solved with engineering efforts, such as using integer-only inference. Thus, the deterministic floating-point inference is unnecessary. Second, the deterministic mode extremely slows down the speed of specific operators, like transposed convolutions which are adopted by ELIC and earlier baseline models (Balle et al. and Minnen et al.), making the comparison somewhat unfair.
Relative measurements may be more worthwhile on different machines. Looking at their chart, it looks like:
[!NOTE] The above comment still applies.
I took another look at the paper, and it says:
The architecture of $g_{\text{ch}}$ network is frankly sketched from Minnen et al. (2020). [...] The parameter aggregation network linearly reduces the dimensions to $2 M^{k}$ .
Our current implementation uses a sequential_channel_ramp
for $g_{\text{ch}}$. But the paper description seems to suggest:
# In [He2022], this is labeled "g_ch^(k)".
channel_context = {
f"y{k}": nn.Sequential(
conv(sum(self.groups[:k]), 224, kernel_size=5, stride=1),
nn.ReLU(inplace=True),
conv(224, 128, kernel_size=5, stride=1),
nn.ReLU(inplace=True),
conv(128, self.groups[k] * 2, kernel_size=5, stride=1),
)
for k in range(1, len(self.groups))
}
However, our current param_aggregation
seems to match the paper:
# In [He2022], this is labeled "Param Aggregation".
param_aggregation = [
sequential_channel_ramp(
# Input: spatial context, channel context, and hyper params.
self.groups[k] * 2 + (k > 0) * self.groups[k] * 2 + N * 2,
self.groups[k] * 2,
min_ch=N * 2,
num_layers=3,
interp="linear",
make_layer=nn.Conv2d,
make_act=lambda: nn.ReLU(inplace=True),
kernel_size=1,
stride=1,
padding=0,
)
for k in range(len(self.groups))
]
Thank you so much for your reimplementation of ELIC. However, in ELIC's original paper, they reported that the decoding latency of ELIC is much less than 100ms. But when I test with ELIC from CompressAI, the latency is about 130ms. May I ask why there is a speed gap? Looking forward to your reply.