Continuing from checkpoint results in: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

MIT License

5.75k stars 466 forks source link

I've wanted to pretrain the model to a new language, so I ran it on a dataset for 30 epochs. When training, the logger showed 200 M trainable params. After training and checking the results, I decided to train it some more, so I copied and modified the config yaml to point to my already trained model stored locally.

This, however, added another 59 M params to the model, as the console now says:

  | Name  | Type       | Params
-------------------------------------
0 | model | DonutModel | 259 M
-------------------------------------
259 M     Trainable params
0         Non-trainable params
259 M     Total params
1,039.623 Total estimated model params size (MB)

My initial model was just 800MB and 200 M params. Is this intentional? If not, what might've changed it? I'm using the exact same config except for the path to the model I want to train.

clovaai / donut

Continuing from checkpoint results in: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group #180