Closed nospotfer closed 2 months ago
I managed to do a workaround by simply adding **kwargs to DocumentBuilder's init() like this:
def __init__(
self,
resolve_lines: bool = True,
resolve_blocks: bool = True,
paragraph_break: float = 0.035,
export_as_straight_boxes: bool = False,
**kwargs: Any
But perhaps there's a more elegant solution
Hey @nospotfer :wave:,
Thanks for reporting could you please add a full working code snippet to reproduce the error ? :)
@nospotfer sry it's a typo in the readme xD
reco_arch
instead of rec_arch
updated :+1:
Great! In any case, here's the snippet I used:
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor
doc = DocumentFile.from_pdf("/path/to/pfd_file.pdf")
model = ocr_predictor(
det_arch='db_resnet50', # detection architecture
rec_arch='crnn_mobilenet_v3_large', # recognition architecture
det_bs=4, # detection batch size
reco_bs=1024, # recognition batch size
assume_straight_pages=True, # set to `False` if the pages are not straight (rotation, perspective, etc.) (default: True)
straighten_pages=False, # set to `True` if the pages should be straightened before final processing (default: False)
preserve_aspect_ratio=True, # set to `False` if the aspect ratio should not be preserved (default: True)
symmetric_pad=True, # set to `False` to disable symmetric padding (default: True)
# DocumentBuilder specific parameters
resolve_lines=True, # whether words should be automatically grouped into lines (default: True)
resolve_blocks=True, # whether lines should be automatically grouped into blocks (default: True)
paragraph_break=0.035, # relative length of the minimum space separating paragraphs (default: 0.035)
)
ocr_response = model(doc)
json_response = ocr_response.export()
text_response = ocr_response.render()
Just in case it can be useful for someone else
Added also a small first small benchmark at the readme bottom :)
The issue is still present with 0.1.2.
Steps to reproduce:
from onnxtr.io import DocumentFile
from onnxtr.models import ocr_predictor
model = ocr_predictor(
det_arch='db_resnet50', # detection architecture
rec_arch='crnn_mobilenet_v3_large', # recognition architecture
det_bs=4, # detection batch size
reco_bs=1024, # recognition batch size
assume_straight_pages=True, # set to `False` if the pages are not straight (rotation, perspective, etc.) (default: True)
straighten_pages=False, # set to `True` if the pages should be straightened before final processing (default: False)
# Preprocessing related parameters
preserve_aspect_ratio=True, # set to `False` if the aspect ratio should not be preserved (default: True)
symmetric_pad=True, # set to `False` to disable symmetric padding (default: True)
# Additional parameters - meta information
detect_orientation=False, # set to `True` if the orientation of the pages should be detected (default: False)
detect_language=False, # set to `True` if the language of the pages should be detected (default: False)
# DocumentBuilder specific parameters
resolve_lines=True, # whether words should be automatically grouped into lines (default: True)
resolve_blocks=True, # whether lines should be automatically grouped into blocks (default: True)
paragraph_break=0.035, # relative length of the minimum space separating paragraphs (default: 0.035)
)
Traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 4
1 from onnxtr.io import DocumentFile
2 from onnxtr.models import ocr_predictor
----> 4 model = ocr_predictor(
5 det_arch='db_resnet50', # detection architecture
6 rec_arch='crnn_mobilenet_v3_large', # recognition architecture
7 det_bs=4, # detection batch size
8 reco_bs=1024, # recognition batch size
9 assume_straight_pages=True, # set to `False` if the pages are not straight (rotation, perspective, etc.) (default: True)
10 straighten_pages=False, # set to `True` if the pages should be straightened before final processing (default: False)
11 # Preprocessing related parameters
12 preserve_aspect_ratio=True, # set to `False` if the aspect ratio should not be preserved (default: True)
13 symmetric_pad=True, # set to `False` to disable symmetric padding (default: True)
14 # Additional parameters - meta information
15 detect_orientation=False, # set to `True` if the orientation of the pages should be detected (default: False)
16 detect_language=False, # set to `True` if the language of the pages should be detected (default: False)
17 # DocumentBuilder specific parameters
18 resolve_lines=True, # whether words should be automatically grouped into lines (default: True)
19 resolve_blocks=True, # whether lines should be automatically grouped into blocks (default: True)
20 paragraph_break=0.035, # relative length of the minimum space separating paragraphs (default: 0.035)
21 )
File ~/anaconda3/envs/unsloth/lib/python3.10/site-packages/onnxtr/models/zoo.py:103, in ocr_predictor(det_arch, reco_arch, assume_straight_pages, preserve_aspect_ratio, symmetric_pad, export_as_straight_boxes, detect_orientation, straighten_pages, detect_language, **kwargs)
56 def ocr_predictor(
57 det_arch: Any = "fast_base",
58 reco_arch: Any = "crnn_vgg16_bn",
(...)
66 **kwargs: Any,
67 ) -> OCRPredictor:
68 """End-to-end OCR architecture using one model for localization, and another for text recognition.
69
70 >>> import numpy as np
(...)
101 OCR predictor
102 """
--> 103 return _predictor(
104 det_arch,
105 reco_arch,
106 assume_straight_pages=assume_straight_pages,
107 preserve_aspect_ratio=preserve_aspect_ratio,
108 symmetric_pad=symmetric_pad,
109 export_as_straight_boxes=export_as_straight_boxes,
110 detect_orientation=detect_orientation,
111 straighten_pages=straighten_pages,
112 detect_language=detect_language,
113 **kwargs,
114 )
File ~/anaconda3/envs/unsloth/lib/python3.10/site-packages/onnxtr/models/zoo.py:43, in _predictor(det_arch, reco_arch, assume_straight_pages, preserve_aspect_ratio, symmetric_pad, det_bs, reco_bs, detect_orientation, straighten_pages, detect_language, **kwargs)
37 # Recognition
38 reco_predictor = recognition_predictor(
39 reco_arch,
40 batch_size=reco_bs,
41 )
---> 43 return OCRPredictor(
44 det_predictor,
45 reco_predictor,
46 assume_straight_pages=assume_straight_pages,
47 preserve_aspect_ratio=preserve_aspect_ratio,
48 symmetric_pad=symmetric_pad,
49 detect_orientation=detect_orientation,
50 straighten_pages=straighten_pages,
51 detect_language=detect_language,
52 **kwargs,
53 )
File ~/anaconda3/envs/unsloth/lib/python3.10/site-packages/onnxtr/models/predictor/predictor.py:57, in OCRPredictor.__init__(self, det_predictor, reco_predictor, assume_straight_pages, straighten_pages, preserve_aspect_ratio, symmetric_pad, detect_orientation, detect_language, **kwargs)
55 self.det_predictor = det_predictor
56 self.reco_predictor = reco_predictor
---> 57 _OCRPredictor.__init__(
58 self, assume_straight_pages, straighten_pages, preserve_aspect_ratio, symmetric_pad, **kwargs
59 )
60 self.detect_orientation = detect_orientation
61 self.detect_language = detect_language
File ~/anaconda3/envs/unsloth/lib/python3.10/site-packages/onnxtr/models/predictor/base.py:50, in _OCRPredictor.__init__(self, assume_straight_pages, straighten_pages, preserve_aspect_ratio, symmetric_pad, **kwargs)
48 self.straighten_pages = straighten_pages
49 self.crop_orientation_predictor = None if assume_straight_pages else crop_orientation_predictor()
---> 50 self.doc_builder = DocumentBuilder(**kwargs)
51 self.preserve_aspect_ratio = preserve_aspect_ratio
52 self.symmetric_pad = symmetric_pad
TypeError: DocumentBuilder.__init__() got an unexpected keyword argument 'rec_arch'
onnxTR version: 0.1.2
@nospotfer reco_arch
not rec_arch
:)
Take a look i had updated the readme :)
You're completely right! It works like a charm. Thank you!
I have already planned to experiment next week a bit with onnxruntime available optimization and 8-Bit quantization to further boost the inference latency on CPU :)
I'll be very happy to give it a try! In any case, docTR cpu in onnx without quantization is already blazing fast.
El vie, 10 may 2024, 18:18, Felix Dittrich @.***> escribió:
I have already planned to experiment next week a bit with onnxruntime available optimization and 8-Bit quantization to further boost the inference latency on CPU :)
— Reply to this email directly, view it on GitHub https://github.com/felixdittrich92/OnnxTR/issues/5#issuecomment-2104880536, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3TQYZSXXJAY2MJFEQMKPTZBTXNBAVCNFSM6AAAAABHQPWQOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBUHA4DANJTGY . You are receiving this because you were mentioned.Message ID: @.***>
@nospotfer 8-bit available now: https://github.com/felixdittrich92/OnnxTR/releases/tag/v0.2.0 Feel free to test it 🤗
I'll give it a try as soon as I can!
Bug description
DocumentBuilder expects no "rec_arch", but it is passed through kwargs.
Code snippet to reproduce the bug
Error traceback
Environment
OS: Ubuntu 22.04 Python version: 3.11 Library version: Onnxruntime version: 1.17