InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.16k stars 230 forks source link

EntropyBottleneck with adjustable bin width #308

Open angus27rzz opened 6 days ago

angus27rzz commented 6 days ago

Feature

Support for Custom Bin Width in EntropyBottleneck

Motivation

So far only bin width equal to 1 is considered, but would be good to have this as tunable option.

Additional context

Are the only methods that should be changed quantize and _likelihood ? Or are there other important changes I am missing.. I'm not sure about _get_medians.

Here is how I would change quantize:

def quantize(
    self, inputs: Tensor, mode: str, means: Optional[Tensor] = None, bin_width: float = 1.0) -> Tensor:
    if mode not in ("noise", "dequantize", "symbols"):
        raise ValueError(f'Invalid quantization mode: "{mode}"')

    if mode == "noise":
        half = bin_width / 2
        noise = torch.empty_like(inputs).uniform_(-half, half)
        inputs = inputs + noise
        return inputs

    outputs = inputs.clone()
    if means is not None:
        outputs -= means

    outputs = torch.round(outputs / bin_width) * bin_width

    if mode == "dequantize":
        if means is not None:
            outputs += means
        return outputs

    assert mode == "symbols", mode
    outputs = outputs.int()
    return outputs

and here _likelihood:

def _likelihood(self, inputs: Tensor, bin_width: float = 1.0, stop_gradient: bool = False) -> Tuple[Tensor, Tensor, Tensor]:
    half = bin_width / 2  # Adjust based on the bin width
    lower = self._logits_cumulative(inputs - half, stop_gradient=stop_gradient)
    upper = self._logits_cumulative(inputs + half, stop_gradient=stop_gradient)
    likelihood = torch.sigmoid(upper) - torch.sigmoid(lower)
    return likelihood, lower, upper

Your further guidance is appreciated! Thank you!

YodaEmbedding commented 5 days ago

At a quick glance, this should work for training, though I wonder if the lossless entropy coder also needs adjustment.

A simpler method might be to just rescale the outputs by the desired bin width instead. Since both are uniform quantizers, it should be equivalent.