clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
https://arxiv.org/abs/2111.15664
MIT License
5.75k stars 466 forks source link

Complete text #253

Open VagnerBelfort opened 1 year ago

VagnerBelfort commented 1 year ago

Hello! How do I extract the full text from an image?

Thanks!!!

VagnerBelfort commented 1 year ago

With the code below I was able to extract the text from an image, but the full text did not come.

from donut import DonutModel
import torch
from PIL import Image

pretrained_model = DonutModel.from_pretrained("naver-clova-ix/donut-base")
if torch.cuda.is_available():
    pretrained_model.half()
    device = torch.device("cuda")
    pretrained_model.to(device)
else:
    pretrained_model.encoder.to(torch.bfloat16)
pretrained_model.eval()

task_name = "synthdog"
task_prompt = f"<s_{task_name}>"

input_img = Image.open("text.jpg")
output = pretrained_model.inference(image=input_img, prompt=task_prompt)["predictions"][0]
print(output)
vikasr111 commented 7 months ago

@VagnerBelfort did you get any success in extracting complete text? I have been looking for the same but haven't found anything concrete.

VagnerBelfort commented 7 months ago

Hi!!! can't =( @vikasr111