gnana70 / tamil_ocr

OCR Tamil is a powerful tool that can detect and recognize text in Tamil images with high accuracy on Natural Scenes
https://github.com/gnana70/tamil_ocr
MIT License
51 stars 10 forks source link

Error when doing both detection and recognition on image #54

Closed Dario-Mantegazza closed 8 months ago

Dario-Mantegazza commented 8 months ago

Hi, I'm testing your code on a STD+R task on a private dataset, while it works for some images, for some others I get this error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[4], [line 3](vscode-notebook-cell:?execution_count=4&line=3)
      [1](vscode-notebook-cell:?execution_count=4&line=1) for image_ in images:
      [2](vscode-notebook-cell:?execution_count=4&line=2)     print(image_)
----> [3](vscode-notebook-cell:?execution_count=4&line=3)     texts = ocr.predict(image_)
      [4](vscode-notebook-cell:?execution_count=4&line=4)     print(texts)

File [/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:447](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:447), in OCR.predict(self, image)
    [445](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:445) image = self.read_image_input(image)
    [446](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:446) if self.detect:
--> [447](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:447)     exported_regions,updated_prediction_result = self.craft_detect(image)
    [448](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:448)     inter_text_list,conf_list = self.text_recognize_batch(exported_regions)
    [449](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:449)     text_list = [self.output_formatter(inter_text_list,conf_list,updated_prediction_result)]

File [/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:254](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:254), in OCR.craft_detect(self, image, **kwargs)
    [251](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:251)     if w>0 and h>0:
    [252](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:252)         new_bbox.append([x,y,w,h])
--> [254](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:254) ordered_new_bbox,line_info = self.sort_bboxes(new_bbox)
    [256](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:256) updated_prediction_result = []
    [257](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:257) for ordered_bbox in ordered_new_bbox:

File [/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:198](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:198), in OCR.sort_bboxes(self, contours)
    [196](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:196) def sort_bboxes(self,contours):
    [197](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:197)     c = np.array(contours)
--> [198](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:198)     max_height = np.median(c[::, 3]) * 0.5
    [200](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:200)     # Sort the contours by y-value
    [201](https://vscode-remote+ssh-002dremote-002brunvidev-007etethys.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/dist-packages/ocr_tamil/ocr.py:201)     by_y = sorted(contours, key=lambda x: x[1])  # y values

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Can you give me a hand with it?

Dario-Mantegazza commented 8 months ago

Unfortunately I cannot share example of images, but I can say that when no readable (low res) text is present the code fails in this way. For now I'm handling it with a try except and print in the except ' no readable text found'

gnana70 commented 8 months ago

Thanks for reporting this @Dario-Mantegazza , I will handle this in next version of the code.

Quick hack that may help in the low resolution text Lower the threshold for text_threshold (can accept any value between 0 to 1). This can be done as below when initializing the ocr module. ocr = OCR(detect=True,text_threshold=0.2)

gnana70 commented 8 months ago

@Dario-Mantegazza , Please update ocr_tamil package using below command.

pip install -U ocr_tamil

Dario-Mantegazza commented 8 months ago

Thanks for the quick reply, I will also try to lower that threshold, (OT, it would be ideal to have a rough guide/doc on the different params and their effect). Regarding updating the package, I'm now rebuilding my container image and I will update you on the results :)

Dario-Mantegazza commented 8 months ago

I'm getting the same error. The container build, installed the version 0.3.6, I don't know if this helps. At least by changing the threshold, now more images are read without errors. Edit: I think I have a picture that I can share that fails even with the low threshold. here it is clear that no text is readable, this sometimes happens in our pipeline. image

gnana70 commented 8 months ago

@Dario-Mantegazza , Couldn't able to reproduce the issue. I have run the inference on the image you have shared above and attached the colab notebook that I have used to run this.

https://colab.research.google.com/drive/11tMrKEeMtoKILwFP8TeChnI8_bEIeulA?usp=sharing

gnana70 commented 8 months ago
image
Dario-Mantegazza commented 8 months ago

@Dario-Mantegazza , Couldn't able to reproduce the issue. I have run the inference on the image you have shared above and attached the colab notebook that I have used to run this.

https://colab.research.google.com/drive/11tMrKEeMtoKILwFP8TeChnI8_bEIeulA?usp=sharing

OK, I will check if it is a problem with my setup and let you know next week. Thanks for your support in the meantime, it was super

Dario-Mantegazza commented 8 months ago

sorry I got some unforeseen delays, I will probably get back to you next week

Dario-Mantegazza commented 8 months ago

Ok it works. Earlier I was probably working with some cached version of the lib. My setup is not straightforward and this could have happened. Thanks for fixing the bug. If I have more issues like this I will post them here, but I'm sure this will not happen :)