EuracBiomedicalResearch / RescueOMR

batch Optical Mark Recognition without foresight
https://www.thregr.org/~wavexx/software/RescueOMR/
GNU Affero General Public License v3.0
39 stars 9 forks source link

Inconsistent "insufficient model inliers" #4

Open uyv-tbrothers opened 6 years ago

uyv-tbrothers commented 6 years ago

Hey all,

When I run

extractmpl -v -r 935x198+114+1099 templates/data2.jpg input/filled.jpg mpls/data.jpg

Where templates/data2.jpg looks like: data2

and where input/filled.jpg looks like: example

and where both these scans are 300dpi, black and white.

I sometimes get

INFO:root:found 125 features in template
INFO:root:found 126 features in image
INFO:root:insufficient model inliers (1 of 2181, min=37)

Sometimes I get the mpl file, and sometimes I get that "insufficient model inliers" notification and no mpl file. Right now, the mpl is extracted about 50% of the time.

I have tried expanding and shrinking the -r window, removing the bubbles from the template, making the template area bigger and smaller, but to even less consistent results. I have also tried 600dpi, which improved other scan areas, but this one remains the worst in terms of consistency.

What does "insufficient model inliers" mean, and how can I more reliably extract this template from this image?

uyv-tbrothers commented 6 years ago

Love the idea of this project, btw! Thanks for releasing it!

uyv-tbrothers commented 6 years ago

Maybe the letters in the bubbles are a problem?

When I run

extractmpl -v -r 91x66+1610+2721 templates/data2-highres-r.jpg input/filled-600.jpg mpls/data.jpg

where templates/data2-highres-r.jpg looks like: data2-highres-r

and the filled-600.jpg looks similar to the filled.jpg above, I get

INFO:root:found 12 features in template
INFO:root:found 12 features in image
INFO:root:insufficient features in template (12, min=20)

whereas when I run a similar command on a number like 1 that looks like this: data2-highres-one I get this

INFO:root:found 0 features in template
INFO:root:found 0 features in image
INFO:root:insufficient features in template (0, min=20)

Maybe this means that letters are being detected as "features" and thus are contributing to a larger number of features, and thus a lower likelihood that the algorithm will work?

wavexx commented 6 years ago

I'm looking at this now.

In your case I identify a few issues though. The template is actually contained several times in the image. This is no different than trying to find a single checkbox or numberal in a page filled with them: which one should be matched against?

There is currently no support for this scenario. Depending on the order of the detected features, the RANSAC search will likely get confused between the various possible positions of the template in the source.

What you need here is to put the 9 blocks together in a single template. I assume you still have a clean page to try this way (otherwise, post it here).