TeluguOCR / banti_telugu_ocr

End to end OCR system for Telugu. Based on Convolutional Neural Networks.
Apache License 2.0
48 stars 17 forks source link

Segmentation fault with banti segmenter #8

Closed ChillarAnand closed 8 years ago

ChillarAnand commented 8 years ago

OCR is working very well with given sample data.

I tried to convert a test image and segmenter is throwing segmentation fault.

$ ./banti_segmenter ../ocr/biddala/0-007.converted.tif 
[1]    1695 segmentation fault (core dumped)  ./banti_segmenter ../ocr/biddala/0-007.converted.tif

Full trace:

# anand at anand in ~/projects/python/banti_telugu_ocr on git:master o [16:06:48]                                               [25/485]
$ python3  recognize.py ../ocr/biddala/0-007.png 
Command line Arguments
        banti_segmenter     ./banti_segmenter
        calibration         1
        input_file_or_dir   ../ocr/biddala/0-007.png
        labels_fname        labellings/alphacodes.lbl
        log_level           20
        ngram_fname         library/mega.123.pkl
        nnet_fname          library/nn.pkl
        scaler_fname        scalings/relative48.scl

Initializing the OCR
Compiling full test function...
Done
Launched command with timeout=10
"convert -units PixelsPerInch ../ocr/biddala/0-007.png -compress Group4 -depth 1 -resample 400 ../ocr/biddala/0-007.converted.tif"
Success
STDOUT:

STDERR:

Launched command with timeout=10
"./banti_segmenter ../ocr/biddala/0-007.converted.tif"
Success
STDOUT:

STDERR:

Launched command with timeout=10
"./banti_segmenter ../ocr/biddala/0-007.converted.tif"
Success
STDOUT:

STDERR:

Traceback (most recent call last):
  File "recognize.py", line 229, in <module>
    recognizer.ocr_box_file(box_fname)
  File "/home/anand/projects/python/banti_telugu_ocr/ocr.py", line 48, in ocr_box_file
    bf = BantryFile(box_fname)
  File "/home/anand/projects/python/banti_telugu_ocr/bantry.py", line 155, in __init__
    in_file = open(name)
FileNotFoundError: [Errno 2] No such file or directory: '../ocr/biddala/0-007.converted.box'

Image used

0-007

rakeshvar commented 8 years ago

రంగనాయకమ్మ రచనలమీదన మన ఓసీయారు పనిచేయదు :) ulimit -s 1000000 should work.

ChillarAnand commented 8 years ago

Thank you :)

ChillarAnand commented 7 years ago

I have formatted my system recently and banti is throwing this error again.

→ ulimit -s        
1000000
→ python recognize.py sample_images/praasa.tif
...
...
Finding most likely sentences...
Line  0
[1]    9622 floating point exception (core dumped)  python recognize.py sample_images/praasa.tif
rakeshvar commented 7 years ago

Can you send me the log file. prasa.debug.log or prasa.info.log. You can specify the log level via a command line arg. See help -h. This is a different problem than before. Before we were using a C++ based antanci_segmenter from this repo. Now we have pure python code. It could also be a problem with PIL, the new version which does not support zero sized arrays! Since it is a floating point exception. But that is just guessing.

ChillarAnand commented 7 years ago

You are right. This was happening because of https://github.com/TeluguOCR/banti_telugu_ocr/issues/16. Downgraded & pinned pillow to Pillow==3.4.2 and its working fine now.