Udayraj123 / OMRChecker

Evaluate OMR sheets fast and accurately using a scanner 🖨 or your phone 🤳.
MIT License
758 stars 314 forks source link

[Feature][Core] Calculate thresholding confidence using data #39

Open Udayraj123 opened 2 years ago

Udayraj123 commented 2 years ago

The core logic of OMRChecker revolves around finding the correct separation between Marked and Unmarked bubbles. We want to let the user know if it has been determined confidently.

Image

In the above image there are two possible thresholds based on the jumps in the histogram. In such cases the confidence metric will be useful to separate bad quality images.

More references in Rich Visuals section.

Note: this issue is marked with the hacktoberfest label. Follow #hacktoberfest-discussions on Discord for further details .

grgkaran03 commented 2 years ago

Hi, I would like to take up this issue. Can you please tell me the approach to how I can start working on this?

Udayraj123 commented 2 years ago

Hi @grgkaran03, thanks for showing interest. Let's discuss it on discord and then you can share your brief summary of things to do over here in a comment. Ping me in the channel mentioned in the description.

Udayraj123 commented 1 year ago

Hi @grgkaran03, any updates/need help with anything?

Udayraj123 commented 1 year ago

This task would be under a PR with an ongoing work for improving the debugging experience.

Udayraj123 commented 8 months ago

Sharing a sample histogram where the MIN_JUMP configuration seemed to be ineffective

image image image

and somehow the global threshold is also too high because the overall image is bright.

Udayraj123 commented 8 months ago

Analysis: The global threshold logic was not working for this q-vals plots. Because the minimum value was too high. (q-vals indicates list of mean pixel values of all bubbles in the omr template)

image

Setting it to 100 is also not separating the red and green lines (ideally red line should auto-correct itself to the first large gap)

This happens when there's no sharp jump between to consecutive values in the above histogram

A confidence metric is needed when there is not "clear first large jump" as it is likely to wrongly detect a few bubbles near-by that threshold (unless of-course a local threshold saves that case)

image

For a particular set of images, we can configure the MIN_JUMP parameter to solve this via config.json:

{
  "threshold_params": {
    "MIN_JUMP": 15
  }
}
image

But reducing the MIN_JUMP increases wrong detections for images with shadows/low contrast shades.

For example, in above plot, the positions 40-50 may potentially have marked bubbles with low contrast. The local thresholding technique should clear the issue most of the times, but OMRChecker is less confident about such cases.

image

The confidence metric should help us identify the same and potentially find a solution. We can try labelling the questions in the plot itself to gather some insights.

Udayraj123 commented 8 months ago

Added code to support field labels in the intensity plot to understand the ambiguity better.

image image
Udayraj123 commented 7 months ago

Turns out the confidence metric to show local vs global threshold disparity is already showing results!

In this scan from community samples we see an ambiguous bubble mark(see Q.131):

image

It was found when looking at the confidence metrics output:

image

Such bubbles may require human intervention or better tuning to avoid uniform output across images

Master branch New output
image image

We've decided to let user's intention to mark be considered thus the bubble will now be marked even if it is not fully filled.

Note: Still if your images contain bad quality prints, where the printed characters('B' in above case) are non-uniformly thick/bold, they may get detected as marked bubbles.