feat(models): add Checkerboard and ELIC

YodaEmbedding commented 1 year ago

Checkerboard and ELIC models.

Differences from original paper:

The number of channels weren't specified for everything, so I made some educated guesses.
Trained on Vimeo90K dataset. (Not ImageNet 8000, like in official paper.)
GMM $K=1$, rather than GMM $K=3$, though with #239 , that may be possible too. I will update GaussianConditionalLatentCodec once that PR is accepted. I guess the STE " - means" offset might need some tweaking too once there's more than one mean...

Checklist:

[ ] Train models.
- [ ] cheng2020-anchor-checkerboard:
  - LBR_lite: ++model.name="cheng2020-anchor-checkerboard" ++hp.N=128 ++hp.M=192
  - LBR: ++model.name="cheng2020-anchor-checkerboard" ++hp.N=192 ++hp.M=320
  - HBR: ++model.name="cheng2020-anchor-checkerboard" ++hp.N=192 ++hp.M=320
- [ ] elic2022-official:
  - LBR_lite: ++model.name="elic2022-official" ++hp.N=128 ++hp.M=192 ++hp.groups='[16, 16, 32, 64, 64]'
  - LBR: ++model.name="elic2022-official" ++hp.N=192 ++hp.M=320 ++hp.groups='[16, 16, 32, 64, 192]'
  - HBR: ++model.name="elic2022-official" ++hp.N=192 ++hp.M=320 ++hp.groups='[16, 16, 32, 64, 192]'
- [ ] elic2022-chandelier:
  - LBR_lite: ++model.name="elic2022-chandelier" ++hp.N=128 ++hp.M=192 ++hp.groups='[16, 16, 32, 64, 64]'
  - LBR: ++model.name="elic2022-chandelier" ++hp.N=192 ++hp.M=320 ++hp.groups='[16, 16, 32, 64, 192]'
  - HBR: ++model.name="elic2022-chandelier" ++hp.N=192 ++hp.M=320 ++hp.groups='[16, 16, 32, 64, 192]'
[ ] Add models to zoo.
[ ] Add results/image/*/compressai-*.json.
[x] Verify compatibility with Chandelier ELiC-ReImplemetation pretrained models (see below).

Compatibility check with Chandelier ELiC-ReImplemetation pretrained models

Download models from: https://github.com/VincentChandelier/ELiC-ReImplemetation#available-checkpoint

Save the following python script:

Click for python script

```python import argparse import torch FORWARD_MAPPING = { "g_a": "g_a", "g_s": "g_s", "h_a": "latent_codec.hyper.h_a", "h_s": "latent_codec.hyper.h_s", "entropy_bottleneck": "latent_codec.hyper.entropy_bottleneck", "cc_transforms.0": "latent_codec.y.channel_context.y1", "cc_transforms.1": "latent_codec.y.channel_context.y2", "cc_transforms.2": "latent_codec.y.channel_context.y3", "cc_transforms.3": "latent_codec.y.channel_context.y4", "context_prediction.0": "latent_codec.y.latent_codec.y0.context_prediction", "context_prediction.1": "latent_codec.y.latent_codec.y1.context_prediction", "context_prediction.2": "latent_codec.y.latent_codec.y2.context_prediction", "context_prediction.3": "latent_codec.y.latent_codec.y3.context_prediction", "context_prediction.4": "latent_codec.y.latent_codec.y4.context_prediction", "ParamAggregation.0": "latent_codec.y.latent_codec.y0.entropy_parameters", "ParamAggregation.1": "latent_codec.y.latent_codec.y1.entropy_parameters", "ParamAggregation.2": "latent_codec.y.latent_codec.y2.entropy_parameters", "ParamAggregation.3": "latent_codec.y.latent_codec.y3.entropy_parameters", "ParamAggregation.4": "latent_codec.y.latent_codec.y4.entropy_parameters", } REVERSE_MAPPING = { "latent_codec.y.latent_codec.y0.y.gaussian_conditional": "gaussian_conditional", "latent_codec.y.latent_codec.y1.y.gaussian_conditional": "gaussian_conditional", "latent_codec.y.latent_codec.y2.y.gaussian_conditional": "gaussian_conditional", "latent_codec.y.latent_codec.y3.y.gaussian_conditional": "gaussian_conditional", "latent_codec.y.latent_codec.y4.y.gaussian_conditional": "gaussian_conditional", } def _rename_key(key): found = False for src_prefix, dst_prefix in FORWARD_MAPPING.items(): if key.startswith(src_prefix): new_key = f"{dst_prefix}{key[len(src_prefix):]}" yield new_key found = True for dst_prefix, src_prefix in REVERSE_MAPPING.items(): if key.startswith(src_prefix): new_key = f"{dst_prefix}{key[len(src_prefix):]}" yield new_key found = True if found: return raise RuntimeError(f"Unmapped key: {key}") def rename_keys(state_dict): max_len = max(len(key) for key in state_dict.keys()) new_state_dict = {} for key, value in state_dict.items(): for new_key in _rename_key(key): print(f"{key:<{max_len}} -> {new_key:<{max_len}}") new_state_dict[new_key] = value return new_state_dict def build_parser(): parser = argparse.ArgumentParser() parser.add_argument( "--input", type=str, required=True, help="Path to the weights file" ) parser.add_argument( "--output", type=str, required=True, help="Path to the output file" ) return parser def main(): parser = build_parser() args = parser.parse_args() state_dict = torch.load(args.input) state_dict = rename_keys(state_dict) torch.save(state_dict, args.output) if __name__ == "__main__": main() ```

Then, run:

python rename_weights_elic.py --input=ELIC_0004_ft_3980_Plateau.pth.tar --output=ELIC_0004_ft_3980_Plateau_renamed.pth.tar
python rename_weights_elic.py --input=ELIC_0008_ft_3980_Plateau.pth.tar --output=ELIC_0008_ft_3980_Plateau_renamed.pth.tar
python rename_weights_elic.py --input=ELIC_0016_ft_3980_Plateau.pth.tar --output=ELIC_0016_ft_3980_Plateau_renamed.pth.tar
python rename_weights_elic.py --input=ELIC_0032_ft_3980_Plateau.pth.tar --output=ELIC_0032_ft_3980_Plateau_renamed.pth.tar
python rename_weights_elic.py --input=ELIC_0150_ft_3980_Plateau.pth.tar --output=ELIC_0150_ft_3980_Plateau_renamed.pth.tar
python rename_weights_elic.py --input=ELIC_0450_ft_3980_Plateau.pth.tar --output=ELIC_0450_ft_3980_Plateau_renamed.pth.tar

Then load the model checkpoint in CompressAI Trainer:

compressai-train ++model.name="elic2022-chandelier" ++hp.N=192 ++hp.M=320 ++hp.groups='[16,16,32,64,192]' ++criterion.lmbda=0.004 ++paths.model_checkpoint="ELIC_0004_ft_3980_Plateau_renamed.pth.tar"

fracape commented 1 year ago

base config (ReduceLROnPlateau)?

YodaEmbedding commented 1 year ago

Yes, default is fine. Judging by this implementation's reported results, it is possible to get very close to the paper's results. Like the paper, they also use 8000 images from ImageNet, but hopefully that has minimal effect.

fracape commented 11 months ago

Hi @YodaEmbedding, was trying to figure out where the problem is, but my results using the above (vimeo90k, default ReduceLROnPlateau) are not the expected performance:

Did you launch on your end or would you have a clue regarding a wrong setup here? checkerboard-vs-cheng2020-anchor

YodaEmbedding commented 11 months ago

I think that may have been due to a bug in compress/decompress. I fixed it. The current implementation seems to be working now with some limited testing. I think forward was fine, so it may have trained correctly. Thus, you may be able to get away with just loading the weights to check, and not have to train from scratch.

Summary of recent changes:

Fixed a bug in CheckerboardLatentCodec compress/decompress: https://github.com/InterDigitalInc/CompressAI/pull/243/commits/8740d86dd622859288a674b5056533fe284a7517
Removed cheng2020-anchor-elic which uses suboptimal architecture settings.
Added elic2022-official which uses official architecture. The number of channels weren't always specified, so I made some educated guesses.
Added elic2022-chandelier which uses the exact same architecture as Vincent Chandelier's ELiC-ReImplemetation. He also provides some pretrained checkpoints. To load them with this implementation, use the Python script provided in the top-level comment to rename the weights. The resulting RD performance is identical.
I also introduced a PR (https://github.com/VincentChandelier/ELiC-ReImplemetation/pull/8) on Chandelier's implementation to improve its readability, in case people find it easier to work with "flatter" code in comparison to the composable LatentCodecs.
Added a Chandelier-inspired _forward_twopass. I originally wrote _forward_twopass_faster based on the original paper's description. Both seem to work and are nearly equivalent, but I've defaulted to Chandelier's twopass since it seems like it would be very slightly more robust under certain architectures.

I plugged in Chandelier's pretrained weights (trained on ImageNet 8000) and then "finetuned" on Vimeo90K for a few epochs.

Chandelier pretrained model finetuned on Vimeo90K for 1 epoch	Chandelier pretrained model finetuned on Vimeo90K for 4 epochs

Not unexpectedly, "finetuning" on a different, larger dataset actually improves RD performance further.

Not shown: finetuning 0 epochs, since I'm a bit too lazy to rerun things.

Suggestions:

Train elic2022-chandelier, elic2022-official, and cheng2020-anchor-checkerboard in that order. See original post for suggested hyperparameters.

YodaEmbedding commented 11 months ago

(Ignore.)

lin-toto commented 11 months ago

Regarding the - means trick, it wouldn't work with GMM. We can't really pre-compute the distributions with GMM because they depend on too many parameters.

lin-toto commented 11 months ago

Also, not sure about ELIC but FYI in my experiments Checkerboard + cheng2020 results differ by quite a lot between K=1 and K=3

InterDigitalInc / CompressAI