PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
44.77k stars 7.86k forks source link

PaddleOCR throwing fscanf failed #12138

Open subhankardori opened 6 months ago

subhankardori commented 6 months ago

W0517 10:10:58.219796 252 default_variables.cpp:95] Fail to fscanf: Success [0]

By any chance, does it have some relevance with Core Dumped , asking because when I was using the latest version of paddlepaddle-gpu, I was constantly hitting this error inside the docker:

root@f76833e33ee6:/code/build/data# python3 
C++ Traceback (most recent call last):
0   inflateReset2

Error Message Summary:
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1715786295 (unix time) try "date -d @1715786295" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 207 (TID 0x71e2902e2b80) from PID 0 ***]

Segmentation fault (core dumped)

so the mysterious part is that when I was running a simple code for paddleocr, it was running hassle-free, but when I was testing and end-to-end operation (in whcih that snippet of paddleocr code was a part), W0517 10:10:58.219796 252 default_variables.cpp:95] Fail to fscanf: Success [0] this is the warning log I am getting. Anything concerning about this log?

SWHL commented 6 months ago

First, it is recommended to verify that the PaddlePaddle framework is successfully installed.

import paddle

# PaddlePaddle is installed successfully!

Then, please provide the smallest reproducible demo.

MonolithFoundation commented 6 months ago

I got same isse. Please, don't make users always reproducable.


Than l am here rerproducable.

SWHL commented 6 months ago

Please provide the most basic operating environment so that we can reproduce this error locally and make the necessary modifications?

OS: PaddleOCR version: Python version: Paddle version: Code:

If you only provide error information without providing context, we cannot solve it here! Because we don't know in what environment this error was reported.

MonolithFoundation commented 6 months ago

I have uninstalled paddlepaddleocr.

subhankardori commented 6 months ago

@SWHL W0605 10:33:27.962114 83 default_variables.cpp:95] Fail to fscanf: Success [0] this is just a cpp warning, give me a fix to force suppress this, causing a lot of hindrance while reading logs on production searched everywhere on the net and ChatGPT, didnt find any

SWHL commented 6 months ago

@subhankardori you are right, I mistakenly thought it was an error log

subhankardori commented 6 months ago

OS: Ubuntu 22.04.1 LTS PaddleOCR version: 2.7.3 Python version: Python 3.10.12 Paddle version: 2.5.2, had to downgrade the version of paddlepaddle-gpu, since the latest wasnt working inside container Code: can't be disclosed, but it is as good as a sample inference code

from paddleocr import PaddleOCR
import os
import numpy as np
import glob
from time import time
# Setup model
ocr_model = PaddleOCR(lang='en', use_angle_cls=True, use_gpu=True)

# Source and destination directories
src_dir = 'abc'
dst_dir = 'xyz'

# Ensure destination directory exists
os.makedirs(dst_dir, exist_ok=True)

# Get list of image files in the source directory
image_paths = glob.glob(os.path.join(src_dir, '*'))

for img_path in image_paths:
    # Running the OCR method on the model
    result = ocr_model.ocr(img_path)
    print("PaddleOCR time:",(t2-t1))

    # Extracting detected components
    boxes = [res[0] for res in result[0]]  # The bounding boxes
    texts = [res[1][0] for res in result[0]]  # The recognized texts
    scores = [res[1][1] for res in result[0]]  # The confidence scores
    print(boxes, texts, scores)
    # # Import the image
    # img = cv2.imread(img_path)
    # ann=img.copy()
    # # Reorder the color channels
    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    # # Draw annotations on the image using OpenCV
    # for box, text, score in zip(boxes, texts, scores):
    #     # Draw the bounding box
    #     box = np.array(box).astype(int)
    #     cv2.polylines(ann, [box], isClosed=True, color=(0, 255, 0), thickness=2)

    #     # Combine text with confidence score
    #     label = f'{text} ({score:.2f})'

    #     # Draw the text and confidence score
    #     cv2.putText(ann, label, (box[0][0], box[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    # # Get the filename from the image path
    # filename = os.path.basename(img_path)

    # # Save the annotated image to the destination directory
    # cv2.imwrite(os.path.join(dst_dir, filename), ann)

But one key here is that, this is only occuring inside docker container, not when I am doing a ppocr inference on the local machine

SWHL commented 6 months ago

That's weird, and this is beyond my ability range. Wait and see if other friends have encountered this problem before

subhankardori commented 6 months ago

here are some things I tried to suppress it, but didnt work, may act as pointer

import os
import sys

# Redirect stderr to /dev/null
sys.stderr = open(os.devnull, 'w')

# Your code here


import warnings
warnings.filterwarnings("ignore", message="Fail to fscanf.*")

didnt work to suppress this C++ warning

MonolithFoundation commented 6 months ago

Just uninstall paddleocr, it's really bad to use.

SWHL commented 6 months ago

@MonolithFoundation We are trying to make it more easier to use. If you don't like it, just don't use it and don't let it affect your mood.

subhankardori commented 6 months ago

@MonolithFoundation chill man, patience is the key, we will collaboratively sort it out PaddleOCR is by far the best scene text recognizer I have used, outperforming paid OCR APIs

SWHL commented 6 months ago

@subhankardori I suspect it's an issue with the PaddlePaddle framework, so I raised an issue in the Paddle repository. We can wait and see how they respond. The link is

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 90 days with no activity.