SuffolkLITLab / FormFyxer

A tool for learning about and pre-processing forms
MIT License
11 stars 1 forks source link

Form field recognizer creates overlapping fields and doesn't recognize all table cells or round radio buttons #31

Open nonprofittechy opened 2 years ago

nonprofittechy commented 2 years ago

civil_docketing_statement_no_labels.pdf

turns into

fields_file(1).pdf

There's a slight bevel--a gray line and a black line--for the boxes with the doubled fields. Maybe there should be some fuzzing heuristic to make sure that two fields aren't placed on top of each other, and they're considered one line when they're very close?

image

Meanwhile, the nearby checkboxes with the same beveling aren't recognized as checkboxes yet.

nonprofittechy commented 2 years ago

Actually looking closer, I don't think the problem is the bevel. It's recognizing the top and the bottom lines of the "Role" boxes as two separate fields.

BryceStevenWilley commented 2 years ago

Most of this should be addressed in #55, I'll have to test it on the given PDFs to make sure we're tuned for this case.