VikParuchuri / surya

OCR, layout analysis, reading order, line detection in 90+ languages
https://www.datalab.to
GNU General Public License v3.0
9.78k stars 632 forks source link

Get assertion error at Tokenizer #24

Closed andrekv17 closed 6 months ago

andrekv17 commented 6 months ago

Hello! I'm trying to test surya at english and russian documents, but can not run even the smallest example.

No matter the languages configuration I tried (['ru'], ['en'] and ['ru', 'en']), I got the same error.

The code I used:

from PIL import Image
from surya.ocr import run_ocr
from surya.model.detection.segformer import load_model as load_det_model, load_processor as load_det_processor
from surya.model.recognition.model import load_model as load_rec_model
from surya.model.recognition.processor import load_processor as load_rec_processor

img_path = str(paths[0])
pdf = pdfium.PdfDocument(img_path)
images = [Image.fromarray(page.render(scale=200/72).to_numpy()) for page in pdf]
# image = Image.fromarray(images[0])
# langs = ['en', 'ru']
langs = ['ru']
det_processor, det_model = load_det_processor(), load_det_model()
rec_model, rec_processor = load_rec_model(), load_rec_processor()

predictions = run_ocr(images, langs, det_model, det_processor, rec_model, rec_processor)

The output from cell:

Loading detection model vikp/surya_det on device cuda with dtype torch.float16
Loading recognition model vikp/surya_rec on device cuda with dtype torch.float16
Detecting bboxes: 100%|██████████| 1/1 [00:01<00:00,  1.80s/it]
Recognizing Text:   0%|          | 0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[16], line 16
     13 det_processor, det_model = load_det_processor(), load_det_model()
     14 rec_model, rec_processor = load_rec_model(), load_rec_processor()
---> 16 predictions = run_ocr(images, langs, det_model, det_processor, rec_model, rec_processor)

File /opt/conda/envs/surya_ocr/lib/python3.9/site-packages/surya/ocr.py:66, in run_ocr(images, langs, det_model, det_processor, rec_model, rec_processor)
     63     all_slices.extend(slices)
     64     all_langs.extend([lang] * len(slices))
---> 66 rec_predictions = batch_recognition(all_slices, all_langs, rec_model, rec_processor)
     68 predictions_by_image = []
     69 slice_start = 0

File /opt/conda/envs/surya_ocr/lib/python3.9/site-packages/surya/recognition.py:31, in batch_recognition(images, languages, model, processor)
     29 batch_langs = languages[i:i+batch_size]
     30 batch_images = images[i:i+batch_size]
---> 31 model_inputs = processor(text=[""] * len(batch_langs), images=batch_images, lang=batch_langs)
     33 batch_pixel_values = model_inputs["pixel_values"]
     34 batch_langs = model_inputs["langs"]

File /opt/conda/envs/surya_ocr/lib/python3.9/site-packages/surya/model/recognition/processor.py:228, in SuryaProcessor.__call__(self, *args, **kwargs)
    225     inputs = self.image_processor(images, *args, **kwargs)
    227 if text is not None:
--> 228     encodings = self.tokenizer(text, lang, **kwargs)
    230 if text is None:
    231     return inputs

File /opt/conda/envs/surya_ocr/lib/python3.9/site-packages/surya/model/recognition/tokenizer.py:90, in Byt5LangTokenizer.__call__(self, texts, langs, pad_token_id, **kwargs)
     87     langs = [langs]
     89 # One language input per text input
---> 90 assert len(langs) == len(texts)
     92 for text, lang in zip(texts, langs):
     93     tokens, lang_list = _tokenize(text, lang)

AssertionError: 

pip freeze output:

annotated-types==0.6.0
asttokens==2.4.1
certifi==2024.2.2
charset-normalizer==3.3.2
comm==0.2.1
contourpy==1.2.0
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.13.1
filetype==1.2.0
fonttools==4.48.1
fsspec==2024.2.0
huggingface-hub==0.20.3
idna==3.6
imageio==2.34.0
importlib-metadata==7.0.1
importlib-resources==6.1.1
ipykernel==6.29.2
ipython==8.18.1
ipywidgets==8.1.2
jedi==0.19.1
Jinja2==3.1.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.6.0
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.7.0.84
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.3.0.86
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.19.3
nvidia-nvtx-cu11==11.8.86
opencv-python==4.9.0.80
packaging==23.2
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.6.1
pydantic-settings==2.1.0
pydantic_core==2.16.2
Pygments==2.17.2
pyparsing==3.1.1
pypdfium2==4.27.0
python-dateutil==2.8.2
python-dotenv==1.0.1
PyYAML==6.0.1
pyzmq==25.1.2
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
six==1.16.0
stack-data==0.6.3
surya-ocr==0.2.1
sympy==1.12
tabulate==0.9.0
tokenizers==0.15.2
torch==2.2.0+cu118
torchaudio==2.2.0+cu118
torchvision==0.17.0+cu118
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
transformers==4.36.2
triton==2.2.0
typing_extensions==4.9.0
urllib3==2.2.0
wcwidth==0.2.13
widgetsnbextension==4.0.10
zipp==3.17.0
VikParuchuri commented 6 months ago

Hi, this is a small typo in the docs, need to fix it. Langs needs to be a list of lists, like predictions = run_ocr(images, [langs], det_model, det_processor, rec_model, rec_processor)

andrekv17 commented 6 months ago

Thank you for the fast response)