clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.56k stars 449 forks source link

What are acceptable input_size values? #107

Open htcml opened 1 year ago

htcml commented 1 year ago

I tried input_size = [480, 480] and train.py gives me the following error:


Traceback (most recent call last): File "train.py", line 149, in train(config) File "train.py", line 57, in train model_module = DonutModelPLModule(config) File "/home/thuan/donut/donut/lightning_module.py", line 35, in init ignore_mismatched_sizes=True, File "/home/thuan/donut/donut/donut/model.py", line 595, in from_pretrained model = super(DonutModel, cls).from_pretrained(pretrained_model_name_or_path, revision="official", *model_args, *kwargs) File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/transformers/modeling_utils.py", line 2230, in from_pretrained model = cls(config, model_args, **model_kwargs) File "/home/thuan/donut/donut/donut/model.py", line 387, in init name_or_path=self.config.name_or_path, File "/home/thuan/donut/donut/donut/model.py", line 70, in init num_classes=0, File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 500, in init downsample=PatchMerging if (i < self.num_layers - 1) else None File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in init for i in range(depth)]) File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in for i in range(depth)]) File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 281, in init mask_windows = window_partition(img_mask, self.window_size) # num_win, window_size, window_size, 1 File "/axp/aida/data/platformds/aiservices/conda/envs/donut/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 111, in window_partition x = x.view(B, H // window_size, window_size, W // window_size, window_size, C) RuntimeError: shape '[1, 1, 10, 1, 10, 1]' is invalid for input of size 225

zhaohm14 commented 6 months ago

I'm encountering the same issue as you. Has this problem been resolved?

Wyzix33 commented 6 months ago

try using multiple of 320 for sizes