clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

Continuing from checkpoint results in: ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group #180

Open csanadpoda opened 1 year ago

csanadpoda commented 1 year ago

I've wanted to pretrain the model to a new language, so I ran it on a dataset for 30 epochs. When training, the logger showed 200 M trainable params. After training and checking the results, I decided to train it some more, so I copied and modified the config yaml to point to my already trained model stored locally.

This, however, added another 59 M params to the model, as the console now says:

  | Name  | Type       | Params
-------------------------------------
0 | model | DonutModel | 259 M
-------------------------------------
259 M     Trainable params
0         Non-trainable params
259 M     Total params
1,039.623 Total estimated model params size (MB)

My initial model was just 800MB and 200 M params. Is this intentional? If not, what might've changed it? I'm using the exact same config except for the path to the model I want to train.

csanadpoda commented 1 year ago

OK I've noticed I haven't specified the checkpoint path in the config yaml. Now I have, I pointed it to the artifacts.ckpt file, but now I'm getting an error ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group. How do I get around this?