clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.53k stars 444 forks source link

How to run Donut Base and Finetuned-CORD-v2 #218

Open gmork13 opened 1 year ago

gmork13 commented 1 year ago

I don't have the time to write on each issue, so I'll compile my findings here and you can do what you will.

To start things off, there are some quite specific requirements. Start off with: pip install build zss lit transformers==4.24.0 timm==0.5.4 protobuf==3.20.3

Any version of torch seems to work. Install the other requirements.

FOR DONUT BASE:

clone the MAIN repo, not the official one. Run with code that imports a DonutProcessor and VisionEncoderDecoderModel from transformers, as can be seen here: https://huggingface.co/spaces/nikhilba/donut-ocr/blob/main/app.py

FOR DONUT-FINETUNED-CORD-V2:

clone the OFFICIAL repo, not the main one. pip install the repo so that you can import DonutModel from Donut. This code can be seen here: https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2/blob/main/app.py

If you run on CPU you might have to remove model.half(), and vice versa enable it if you run on GPU. There's some sketchy if/else inside there depending on if you run on GPU or not defaulting to fp16.

If you run donut-base you might need to pad the image to get any output from it. Use PIL to pad quite a bit so as to make the text appear as on a paper. If you get repeated bad characters you're probably trying to run the Donut-base by importing DonutModel. If you get the 'init-weights' issue your requirements aren't on point, or you got the OFFICIAL or MAIN repo when you should've gotten the other one.

That's all I can remember for now, good luck.

bumi001 commented 10 months ago

Hi,

I tried the following:

sudo apt install git-lfs
git clone https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2
cd donut-base-finetuned-cord-v2
python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt
python3 app.py

But, I get the error you mentioned:

raise NotImplementedError(f"Make sure _init_weights is implemented for {self.class}") NotImplementedError: Make sure _init_weights is implemented for <class 'donut.model.DonutModel'>

Is there something else I can try?

bumi001 commented 10 months ago

The link you provided only shows the main branch. I don't see an official branch there. I mean this link:

https://huggingface.co/spaces/naver-clova-ix/donut-base-finetuned-cord-v2/blob/main/app.py

bumi001 commented 10 months ago

I tried the following: git clone -b official https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2 git clone https://github.com/clovaai/donut.git cd donut

Then I created a test1.py file in the directory donut:

import argparse
import json
import os
import re
from pathlib import Path

import numpy as np
import torch
from datasets import load_dataset
from PIL import Image
from tqdm import tqdm

from donut import DonutModel, JSONParseEvaluator, load_json, save_json

parser = argparse.ArgumentParser()
parser.add_argument("--task", type=str, default="cord-v2")
parser.add_argument("--pretrained_model_name_or_path", type=str, default="../donut-base-finetuned-cord-v2"
)
args, left_argv = parser.parse_known_args()

task_name = args.task
task_prompt = f"<s_{task_name}>"
pretrained_model = DonutModel.from_pretrained(args.pretrained_model_name_or_path)

I just wanted to see if it would create the variable pretrained_model without any issues. However, I ran into the following errors:

raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")

RuntimeError: Error(s) in loading state_dict for DonutModel: size mismatch for encoder.model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for encoder.model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for encoder.model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for encoder.model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([512, 1024]). You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.