9sphere / text-detector

Locating texts in images using computer vision
MIT License
3 stars 4 forks source link

Not sure I'm doing this right #1

Open SB2020-eye opened 4 years ago

SB2020-eye commented 4 years ago

Hi, muthuspark.

I'm new to all of this (including Python). I'm wondering if you can help me with getting this to work correctly.

I first came upon https://muthu.co/extracting-text-regions-from-an-image-using-geometric-properties/. I tried some of the code, but I couldn't get it to give me any output - but I'm sure the problem is me.

Then I came here to github. I couldn't get it to work; but again, probably it's me. I don't really understand how folders relate to each other in general (yet I realize this MUST be extremely basic). (I only have success when someone on github has very, very elementary steps guiding me through every piece. :))

But I figured I was close. So I went to https://nbviewer.jupyter.org/github/muthuspark/text-detector/blob/master/notebooks/Text%20Segmentation%20in%20Image.ipynb. With this info, I worked through each step [1], [2], etc., adding plt.show() along the way to see if I was getting results. I managed to get some output (Yay!) - up through step [6]. (But it took my computer a very long time - especially step [6].)

But I am not entirely confident about my output. The images look a little different than your examples. I've attached my original (greatly reduced in size here) plus my results from steps [2], [3], [4], and [6].

Ultimately, my goal is really to get a box around every individual character - and then saved each one in as an image file. (Ie, I don't need boxes around whole words or paragraphs. But the saved letter images need to be lossless - totally original color and quality.)

If you are able and willing to offer me any guidance, it would be greatly appreciated!

p221v Figure_2tdm Figure_3tdm Figure_4tdm Figure_6tdm

muthuspark commented 4 years ago

Hello @SB2020-eye, My Apologies for the delayed response, I just saw the issue you opened. I am not sure if you are still looking for a solution. I am adding this comment to help anyone else with a similar query.

  1. Your sample image was too big which is the reason why thresholding did not work well. I am using an Adaptive Thresholding with a window size of 3. Small window sizes will only work for small images which was my case. I increased the window size to about 21 for your large image and I was able to get a proper separation of background and foreground.

foregroid

A good window size is an important factor that can make or break any text recognition algorithm. There are many ways to choose this window size but it depends a lot on the size of the characters in the image. Building a generic window size predictor is quite difficult. It's usually selected based on heuristics.

  1. The algorithm runs very slow on my machine too, its mainly because of the size of the image it is processing. Faster machines may speed up the processing speed a bit. Also there is one step where I am discarding connected components having more than 2 children because these are likely to be non characters. I haven't been able to find a faster algorithm to compute this. Usually I skip this part when I need speed.

Also , I am discarding regions smaller than 15, this number will be much larger in your case. Basically trying to get rid of regions which are too small compared to the full image. I go by the general rule of 15*(image_width/1000). So in you case I changed it to 101.

So after discarding the likely non-text regions, this is what I get. down1

  1. Finally after combining all the other overlapping boxes, I get the below output for the proposed algorithm

dow4

  1. Extracting only the found regions, I get the below image. download (3)

You can find the running sample in this notebook here: https://github.com/9sphere/text-detector/blob/master/notebooks/User_sample_Segmentation_in_Image.ipynb

I see that you need individual characters, you may have to run some character segmentation algorithm further on the identified boxes. I will add this as an enhancement to my algorithm. Let me know if you need any more help, I would be happy to guide you.