Justin-Tan / high-fidelity-generative-compression

Pytorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
Apache License 2.0
411 stars 77 forks source link

Error in loading optimizers state dict: optimizers['amort'].load_state_dict(checkpoint['compression_optimizer_state_dict']) #20

Closed ahmedfgad closed 3 years ago

ahmedfgad commented 3 years ago

Hi,

I am training to finetune the pre-trained model hific_low.pt but I am getting an error in the following line in the load_model() function in the utils.py module:

optimizers['amort'].load_state_dict(checkpoint['compression_optimizer_state_dict'])

Here are more details about the error:

Traceback (most recent call last):
  File "train.py", line 283, in <module>
    model_type=args.model_type, current_args_d=dictify(args), strict=False, prediction=False)
  File "D:\Research\Image Compression\HiFiC\high-fidelity-generative-compression-master\src\helpers\utils.py", line 261, in load_model
    optimizers['amort'].load_state_dict(checkpoint['compression_optimizer_state_dict'])
  File "C:\Users\Ahmed Fawzy Gad\AppData\Roaming\Python\Python37\site-packages\torch\optim\optimizer.py", line 123, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

Please help me to solve it.

Thanks you.

ahmedfgad commented 3 years ago

I figured out where is the problem. I compared the expected and assigned state_dict to the model:

print(optimizers['amort'].state_dict()["param_groups"][0])
print(checkpoint['compression_optimizer_state_dict']["param_groups"][0])

The state_dict expected by the optimizer is given below. It is clear that there are 136 elements in the list assigned to the params key.

{'lr': 0.0001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135]}

However, the assigned state_dict has only 120 elements in the list assigned to the params key. {'lr': 0.0001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]}

Due to the mismatch in the sizes, the following error is returned:

ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

By inspecting the state_dict of the checkpoint and the optimizer, I found that they are identical except for the value assigned to the state key.

So, I solved the issue by replacing the following line:

optimizers['amort'].load_state_dict(checkpoint['compression_optimizer_state_dict'])

By the following line:

optimizers['amort'].load_state_dict(checkpoint['compression_optimizer_state_dict'])

Please correct me if I did a mistake.

Justin-Tan commented 3 years ago

Hi, thanks for bringing this up. I think I changed the model definitions to speed up the ANS encoding process after publishing the pretrained models, which explains the mismatch between the model and the checkpoints. If you want to use the pretrained models you can safely use commits before September 12 2020, but I would probably recommend training from scratch using the new models, sorry for the confusion.

ahmedfgad commented 3 years ago

Thank you :)

Olin1461 commented 1 year ago

I meet the same error when I try to run a pretrain model, May I ask what is the exact line of code that you have changed to fix the problem? Thanks