PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.77k stars 7.86k forks source link

PaddleOCR throwing fscanf failed #12138

Open subhankardori opened 6 months ago

subhankardori commented 6 months ago

W0517 10:10:58.219796 252 default_variables.cpp:95] Fail to fscanf: Success [0]

By any chance, does it have some relevance with Core Dumped , asking because when I was using the latest version of paddlepaddle-gpu, I was constantly hitting this error inside the docker:

root@f76833e33ee6:/code/build/data# python3 paddle_and_annotate.py 
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   inflateReset2

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1715786295 (unix time) try "date -d @1715786295" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 207 (TID 0x71e2902e2b80) from PID 0 ***]

Segmentation fault (core dumped)
root@f76833e33ee6:/code/build/data# 

so the mysterious part is that when I was running a simple code for paddleocr, it was running hassle-free, but when I was testing and end-to-end operation (in whcih that snippet of paddleocr code was a part), W0517 10:10:58.219796 252 default_variables.cpp:95] Fail to fscanf: Success [0] this is the warning log I am getting. Anything concerning about this log?

SWHL commented 6 months ago

First, it is recommended to verify that the PaddlePaddle framework is successfully installed.

import paddle

paddle.utils.run_check()
# PaddlePaddle is installed successfully!

Then, please provide the smallest reproducible demo.

MonolithFoundation commented 6 months ago

I got same isse. Please, don't make users always reproducable.

image

Than l am here rerproducable.

SWHL commented 6 months ago

Please provide the most basic operating environment so that we can reproduce this error locally and make the necessary modifications?

OS: PaddleOCR version: Python version: Paddle version: Code:

If you only provide error information without providing context, we cannot solve it here! Because we don't know in what environment this error was reported.

MonolithFoundation commented 6 months ago

I have uninstalled paddlepaddleocr.

subhankardori commented 6 months ago

@SWHL W0605 10:33:27.962114 83 default_variables.cpp:95] Fail to fscanf: Success [0] this is just a cpp warning, give me a fix to force suppress this, causing a lot of hindrance while reading logs on production searched everywhere on the net and ChatGPT, didnt find any

SWHL commented 6 months ago

@subhankardori you are right, I mistakenly thought it was an error log

subhankardori commented 6 months ago

OS: Ubuntu 22.04.1 LTS PaddleOCR version: 2.7.3 Python version: Python 3.10.12 Paddle version: 2.5.2, had to downgrade the version of paddlepaddle-gpu, since the latest wasnt working inside container Code: can't be disclosed, but it is as good as a sample inference code

from paddleocr import PaddleOCR
import os
import numpy as np
import glob
from time import time
# Setup model
ocr_model = PaddleOCR(lang='en', use_angle_cls=True, use_gpu=True)

# Source and destination directories
src_dir = 'abc'
dst_dir = 'xyz'

# Ensure destination directory exists
os.makedirs(dst_dir, exist_ok=True)

# Get list of image files in the source directory
image_paths = glob.glob(os.path.join(src_dir, '*'))

for img_path in image_paths:
    # Running the OCR method on the model
    t1=time()
    result = ocr_model.ocr(img_path)
    t2=time()
    print("PaddleOCR time:",(t2-t1))

    # Extracting detected components
    boxes = [res[0] for res in result[0]]  # The bounding boxes
    texts = [res[1][0] for res in result[0]]  # The recognized texts
    scores = [res[1][1] for res in result[0]]  # The confidence scores
    print(boxes, texts, scores)
    # # Import the image
    # img = cv2.imread(img_path)
    # ann=img.copy()
    # # Reorder the color channels
    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # # Draw annotations on the image using OpenCV
    # for box, text, score in zip(boxes, texts, scores):
    #     # Draw the bounding box
    #     box = np.array(box).astype(int)
    #     cv2.polylines(ann, [box], isClosed=True, color=(0, 255, 0), thickness=2)

    #     # Combine text with confidence score
    #     label = f'{text} ({score:.2f})'

    #     # Draw the text and confidence score
    #     cv2.putText(ann, label, (box[0][0], box[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    # # Get the filename from the image path
    # filename = os.path.basename(img_path)

    # # Save the annotated image to the destination directory
    # cv2.imwrite(os.path.join(dst_dir, filename), ann)

But one key here is that, this is only occuring inside docker container, not when I am doing a ppocr inference on the local machine

SWHL commented 6 months ago

That's weird, and this is beyond my ability range. Wait and see if other friends have encountered this problem before

subhankardori commented 6 months ago

here are some things I tried to suppress it, but didnt work, may act as pointer

import os
import sys

# Redirect stderr to /dev/null
sys.stderr = open(os.devnull, 'w')

# Your code here

OR

import warnings
warnings.filterwarnings("ignore", message="Fail to fscanf.*")

didnt work to suppress this C++ warning

MonolithFoundation commented 6 months ago

Just uninstall paddleocr, it's really bad to use.

SWHL commented 6 months ago

@MonolithFoundation We are trying to make it more easier to use. If you don't like it, just don't use it and don't let it affect your mood.

subhankardori commented 6 months ago

@MonolithFoundation chill man, patience is the key, we will collaboratively sort it out PaddleOCR is by far the best scene text recognizer I have used, outperforming paid OCR APIs

SWHL commented 6 months ago

@subhankardori I suspect it's an issue with the PaddlePaddle framework, so I raised an issue in the Paddle repository. We can wait and see how they respond. The link is https://github.com/PaddlePaddle/Paddle/issues/64969

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 90 days with no activity.