SuffolkLITLab / FormFyxer

A tool for learning about and pre-processing forms
MIT License
11 stars 1 forks source link

Decrease checkbox detection sensitivity #110

Open nonprofittechy opened 1 year ago

nonprofittechy commented 1 year ago

Example with 5 false positives 8d0054ddb4e29e2e76cf7bc7319b50c0.pdf

nonprofittechy commented 1 year ago

Another, artifacts from the scan/a large letter "O" made a false positive 3764ccaf37f1814638c57ff3e3c381ba.pdf

BryceStevenWilley commented 1 year ago

(I'm assuming this is related to https://github.com/SuffolkLITLab/form-explorer/issues/61)

It should be flexible enough to change really easily, but I made the assumption that we'd be running on things that are forms. If it doesn't find any fields, it'll try again with a smaller box . It's probably worth it to have split heuristics; one for "this is a form, get me all the things I want", and one for "idk what this is, does it have any fields?" like we're trying to do now.

BryceStevenWilley commented 1 year ago

Example with 5 false positives 8d0054ddb4e29e2e76cf7bc7319b50c0.pdf

This looks like a bug generally; none of those checkboxes line up with anything, and a few are completely off the page for me.