Testing file site_an_20_07 on field ALLOTMENT

FloCiaglia commented 3 years ago

Once again, the problem with this field is the image preprocessing step. I am taking some time to research other ways to preprocess an image. Maybe changing library could give us some different results.

I am exploring this article to test image preprocessing using scikit-image library instead of open-cv.

FloCiaglia commented 3 years ago

UPDATES The local thresholding doesn't work as expected. It should automatically detect the right value for t (threshold) but, the value it chooses is always too low and the image ends up being mostly black.

It seemed like hardcoding t to .63 worked for most fields. However, this changes as we test more forms. More updates to come.

FloCiaglia commented 3 years ago

The second approach I took was inspecting the image using a histogram. When creating an histogram on a grayscale image, we will see a peak in the second half of the histogram that indicates where the majority of the pixels reside on a color scale. The ideal threshold number on that specific image will be somewhere near the bottom, left side of the peak.

Using the guess and trial method, I found that the last divot before the peak is often a good enough threshold. I implemented an algorithm to get the thresh value on each image --> FOUND ANOTHER PROBLEM with that: this method to find the threshold value is not accurate enough. The value is often times too big, which means that the letter strokes are too thick. This makes it so the letters will touch each other and the segmentation will think they belong together.

FloCiaglia commented 3 years ago

FINAL UPDATEL

None of the preprocessing methods explored in this task have yielded better results than the original method. We will keep the original method in the pipeline and the new methods in the testing directories for future analysis.

BoiseState-AdaptLab / OCR_4_Forest_Service

Testing file site_an_20_07 on field ALLOTMENT #37