PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.78k stars 7.69k forks source link

Deployed PaddleOCR api suddenly results in incorrect OCR text results #13884

Open DipanshuJuneja opened 3 days ago

DipanshuJuneja commented 3 days ago

🔎 Search before asking

🐛 Bug (问题描述)

I have deployed a PaddleOCRAPI via FastAPIon Google Cloud Run which was working correctly when making api calls until yesterday however suddenly today when I run the service, its returning gibberish output. Not sure what happened, I have not deployed any new version. When I run the app locally, it works correctly all the time.

🏃‍♂️ Environment (运行环境)

paddleocr==2.7.3 paddlepaddle==2.6.1 fastapi==0.111.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Below is my initializing code, note that I'm already downloading the models in my Docker image

from paddleocr import PaddleOCR
import os
# def ocr_model():
ocr = PaddleOCR(use_angle_cls=True, lang='en', enable_mkldnn=True, recovery=True,
                det_model_dir=os.path.abspath('fast_api_server/ocr_models/det'), 
                rec_model_dir=os.path.abspath('fast_api_server/ocr_models/rec'),
                cls_model_dir=os.path.abspath('fast_api_server/ocr_models/cls'))
jingsongliujing commented 2 days ago

Can you provide screenshots of any error messages or output logs?

DipanshuJuneja commented 2 days ago

Hi @jingsongliujing, I'm not seeing any error messages or anything in my output logs, it gives back a 200-status code, only that ocr text output is not making sense suddenly since today, I'm attaching the file and text returned for your reference. Note that same file was giving back correct results till yesterday. Linkedin_1.pdf

Output_OCR
jingsongliujing commented 2 days ago

Oh,I need to see the difference between the normal and abnormal information to identify the issue. Based on the information currently provided, we are unable to pinpoint the exact cause of the anomaly.

DipanshuJuneja commented 2 days ago

I understand. Can you please let me know what I can share to debug this? Since there is nothing in the logs. If it helps, I tried changing the version of FastAPI and it gave correct OCR text results for a while but now I'm seeing the issue again. My current versions used are:

fastapi==0.111.0
fastapi-cli==0.0.4
uvicorn==0.30.1

I was thinking it had to do with FastAPi version pinning since I noted I wasn't doing that earlier in my prod environment but was doing so in my local environment, which has been running without any issue. So now I'm using the same Fast API versions. Also could it have something to do with Google Cloud Run configurations? Thanks.

jingsongliujing commented 2 days ago

If you are using the CPU version, it is recommended to convert to the ONNX model and use ONNX Runtime for inference:

https://paddlepaddle.github.io/PaddleOCR/en/ppocr/infer_deploy/paddle2onnx.html#paddle-model-download
 https://github.com/jingsongliujing/OnnxOCR

If you are using GPU for inference, it is suggested to upgrade PaddlePaddle to version 3.0:

python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/