clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.74k stars 466 forks source link

The latest update has the model weights twice the embedding dim size of the actual model installed through github or pip #283

Open Samartha27 opened 8 months ago

Samartha27 commented 8 months ago

12 pretrained_model = DonutModel.from_pretrained(args.pretrained_path) 13 14 if torch.cuda.is_available():

2 frames /usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in _load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype, is_quantized, keep_in_fp32_modules) 3929 "\n\tYou may consider adding ignore_mismatched_sizes=True in the model from_pretrained method." 3930 ) -> 3931 raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") 3932 3933 if is_quantized:

RuntimeError: Error(s) in loading state_dict for DonutModel: size mismatch for encoder.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for encoder.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([512, 1024]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

xiaochu1131 commented 8 months ago

Hello, I've met the same problem with you. May I know how can you solve this problem?

xiaochu1131 commented 8 months ago

!pip install transformers==4.25.1 !pip install pytorch-lightning==1.6.4 !pip install timm==0.5.4 !pip install gradio !pip install donut-python

It seems to be a problem of incorrect package version. After changing transformers to version 4.25.1 and timm to version 0.5.4, I have solved this problem.