gnana70 / tamil_ocr

OCR Tamil is a powerful tool that can detect and recognize text in Tamil images with high accuracy on Natural Scenes
https://github.com/gnana70/tamil_ocr
MIT License
48 stars 10 forks source link

No speed advantage when using batches. #58

Open Dario-Mantegazza opened 5 months ago

Dario-Mantegazza commented 5 months ago

I did some tests when using both detection+recognition with a set of 30 images and I've seen that there is no speed improvements when using batches. So I checked the code and if I got it right in your implementation, https://github.com/gnana70/tamil_ocr/blob/71a91db47ba76c2c6b4612e68276da5077edc47a/ocr_tamil/ocr.py#L527-L536 you split the batch into single images and then pass each image to craft, get the BB and pass those to ParSeq.

I'm not an expert in Parseq, but if it already can deal with batches of BB why not simply take all the BB from the all batch and pass those as a single input to parseq?

To recap my suggestion why don't you do something like the following:

bbs=[]
for image in batch:
     bb_preds=craft(image)
     bbs.appens(bb_preds)
texts=parseq_read_batch(bbs)

This should be faster as you call parseq only once per batch and not per image, albeit with a larger memory cost but that can be dealt by the batches size parameter.

Obviously even better would be to do something like:

bbs=craft_batch(batch)
texts=parseq_batch(bbs)
Dario-Mantegazza commented 5 months ago

Apparently CRAFT can run in batches, here

I think running the inference in parallel is difficult due to the post-processing step, which is performed in CPU unless you use multi-processing technique. However, the batch-processing of deep networks is possible within a memory limit.

https://github.com/clovaai/CRAFT-pytorch/issues/44#issuecomment-533797024

and in other comments in the issue section of CRAFT's GitHub, it is stated that batch prediction is feasible. It would be interesting if the batch functionality of ocr-tamil would exploit this.

Dario-Mantegazza commented 5 months ago

Also, I think it would make more sense to decouple the batchsize used by parseq for the text recognition and the tamil-ocr batch size parameter. these should be two separate numbers. I like this library, please keep working on it :)

gnana70 commented 5 months ago

Hi @Dario-Mantegazza , thanks for your feedback. I will try to include batch mode for CRAFT text detection in coming weeks.

Dario-Mantegazza commented 5 months ago

Hi again @gnana70, in the meantime I will make a fork and see if I can implement a temporary workaround. I will keep you posted. Cheers

gnana70 commented 5 months ago

Hi @Dario-Mantegazza , thanks for your help. Please share your workaround once done.

Dario-Mantegazza commented 5 months ago

So I tried to change the code in the most simple and hacky way, but for now, I don't get better performances; I think that something is broken in my edited version and while all the model accepts batched input, something else curb the performance gain. I will upload my version that works partially on my fork but due to work deadlines I don't think I can spend more time on this.

gnana70 commented 5 months ago

@Dario-Mantegazza , no problem. I will investigate and fix it up

JamesDConley commented 5 months ago

Most of the time in processing appears to be the cv2/numpy code for extracting the detected word images from the main image. I swapped this code out for a simple min/max rectangle and saw time for a page I was testing on a file that went from 360s to under 15s.

For images with larger numbers of bounding boxes, this will be an even more drastic speedup, since it reduces this from 1-2 seconds per bounding box to around 1/100000 of a second per bounding box.

the only downside is that this isn't straightening the text- it just pulls out a bounding box. This works for my use case though since I am extracting from documents without any tilted text.

Here are the timings before and after for the portion of the code I was in

Before

Timer started!
Read Image took 0.00 seconds (0.00 seconds total)
Timer started!
    Got size took 0.00 seconds (0.00 seconds total)
    Got prediction took 11.34 seconds (11.34 seconds total)
    Transformed bboxes initial took 0.00 seconds (11.34 seconds total)
    Sorted bounding boxes took 0.00 seconds (11.34 seconds total)
    Updated prediction results took 0.00 seconds (11.34 seconds total)
    **Exported file paths took 348.48 seconds** (359.82 seconds total)
    Updated prediction results again took 0.00 seconds (359.82 seconds total)

After

Timer started!
Read Image took 0.00 seconds (0.00 seconds total)
Timer started!
    Got size took 0.00 seconds (0.00 seconds total)
    Got prediction took 11.08 seconds (11.08 seconds total)
    Transformed bboxes initial took 0.00 seconds (11.08 seconds total)
    Sorted bounding boxes took 0.00 seconds (11.08 seconds total)
    Updated prediction results took 0.00 seconds (11.08 seconds total)
    **Exported file paths took 0.01 seconds** (11.09 seconds total)
    Updated prediction results again took 0.00 seconds (11.09 seconds total)

Code is at https://github.com/JamesDConley/faster_tamil_ocr Got a bit of debugging/testing left to do but I'll likely have a PR tomorrow or the following night.