PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k stars 7.32k forks source link

使用PaddleOCR,读取数据很慢,但是用PaddleOCRSharp就很快 #12031

Closed xiaolongzhuanshi closed 1 week ago

xiaolongzhuanshi commented 2 weeks ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

代码: 'ocr = PaddleOCR(use_angle_cls=True, use_gpu=True,lang="ch") start_time = time.time() result = ocr.ocr(path) end_time = time.time() execution_time = end_time - start_time print("Execution time:", execution_time, "seconds")' 执行结果 '[2024/04/30 14:36:13] ppocr DEBUG: dt_boxes num : 52, elapse : 0.46878957748413086 [2024/04/30 14:36:14] ppocr DEBUG: cls num : 52, elapse : 0.9789741039276123 [2024/04/30 14:36:29] ppocr DEBUG: rec_res num : 52, elapse : 14.643944025039673 Execution time: 16.11536145210266 seconds'

发现执行 rec_res 这里速度特别的慢

然后是PaddleOCRSharp 输出是 平均时间: 0:00:03.898628

都是同一台机器,不知道为什么,难道是我依赖太多的问题?求大佬解答 这是我的依赖 'C:\Users\Administrator>pip list Package Version


aiohttp 3.8.6 aiosignal 1.3.1 altgraph 0.17.4 annotated-types 0.5.0 anyio 3.7.1 astor 0.8.1 async-timeout 4.0.3 asynctest 0.13.0 attrdict 2.0.1 attrdict3 2.0.2 attrs 23.2.0 auto-py-to-exe 2.43.3 Babel 2.14.0 bce-python-sdk 0.9.7 beautifulsoup4 4.12.3 bottle 0.12.25 bottle-websocket 0.2.9 cachetools 5.3.3 certifi 2024.2.2 cffi 1.15.1 charset-normalizer 3.3.2 ci-info 0.3.0 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 colorlog 6.8.2 configobj 5.0.8 configparser 5.3.0 cssselect 1.2.0 cssutils 2.7.1 cycler 0.11.0 Cython 3.0.10 datasets 2.13.2 decorator 5.1.1 dill 0.3.4 Eel 0.16.0 et-xmlfile 1.1.0 etelemetry 0.3.1 exceptiongroup 1.2.1 fastapi 0.103.2 filelock 3.12.2 fire 0.6.0 fitz 0.0.1.dev2 Flask 2.2.5 flask-babel 3.1.0 Flask-Cors 4.0.0 flatbuffers 24.3.25 fonttools 4.38.0 frozenlist 1.3.3 fsspec 2023.1.0 future 1.0.0 gevent 22.10.2 gevent-websocket 0.10.1 greenlet 3.0.3 h11 0.14.0 httpcore 0.17.3 httplib2 0.22.0 httpx 0.24.1 huggingface-hub 0.16.4 humanfriendly 10.0 idna 3.7 imageio 2.31.2 imgaug 0.4.0 importlib-metadata 4.13.0 isodate 0.6.1 itsdangerous 2.1.2 jieba 0.42.1 Jinja2 3.1.3 joblib 1.3.2 kiwisolver 1.4.5 lmdb 1.4.1 looseversion 1.3.0 lxml 5.2.1 markdown-it-py 2.2.0 MarkupSafe 2.1.5 matplotlib 3.5.3 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.12.2 networkx 2.6.3 nibabel 4.0.2 nipype 1.8.6 numpy 1.21.6 onnx 1.14.1 onnxruntime 1.14.1 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.9.0.80 openpyxl 3.1.2 opt-einsum 3.3.0 packaging 24.0 paddle-bfloat 0.1.7 paddle2onnx 1.0.6 paddlefsl 1.1.0 paddlenlp 2.6.1 paddleocr 2.7.0.2 paddlepaddle 2.5.2 pandas 1.3.5 pdf2docx 0.5.8 pefile 2023.2.7 Pillow 8.3.2 pip 24.0 premailer 3.10.0 protobuf 3.20.2 prov 2.0.0 psutil 5.9.8 pyarrow 12.0.1 pyclipper 1.3.0.post5 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.5.3 pydantic_core 2.14.6 pydot 2.0.0 Pygments 2.17.2 pyinstaller 5.13.2 pyinstaller-hooks-contrib 2024.5 PyMuPDF 1.20.2 pypandoc 1.13 pyparsing 3.1.2 pyreadline 2.1 python-dateutil 2.9.0.post0 python-docx 1.1.0 pytz 2024.1 PyWavelets 1.3.0 pywin32-ctypes 0.2.2 pyxnat 1.6.2 PyYAML 6.0.1 rapidfuzz 3.4.0 rarfile 4.2 rdflib 6.3.2 requests 2.31.0 rich 13.7.1 safetensors 0.4.3 scikit-image 0.19.3 scikit-learn 1.0.2 scipy 1.7.3 sentencepiece 0.2.0 seqeval 1.2.2 setuptools 68.0.0 shapely 2.0.4 shellingham 1.5.4 simplejson 3.19.2 six 1.16.0 sniffio 1.3.1 soupsieve 2.4.1 starlette 0.27.0 sympy 1.10.1 termcolor 2.3.0 threadpoolctl 3.1.0 tifffile 2021.11.2 tqdm 4.66.2 traits 6.3.2 typer 0.12.3 typing_extensions 4.7.1 urllib3 1.25.11 uvicorn 0.22.0 visualdl 2.5.3 Werkzeug 2.2.3 whichcraft 0.6.1 xxhash 3.4.1 yacs 0.1.8 yarl 1.9.4 zipp 3.15.0 zope.event 5.0 zope.interface 6.3'

TingquanGao commented 1 week ago

你是用的PaddleOCR中提供的推理示例代码,是python的,并且没有进行特别的速度优化,相比于PaddleOCRSharp是C++开发的,因此速度上会有差异。