Open nonprofittechy opened 2 years ago
This might be a regression in #55: we'll look above the horizontal line, see that something is in the middle of it, and stop.
IMO this isn't a good case for auto field detection. The lines are primarily for presentation, not semantic. And for a form like:
To: From:
Fax: Fax:
we wouldn't be able to place any fields. Would have to really hand tune things to work for just this PDF. Marking wontfix, but open to debate.
Fair enough--I'm willing to revisit if we run into a lot of similar PDFs in the wild, but this sample wasn't originally intended to be used on a computer.
Just noting a regression. The attached PDF now has no fields recognized at all. I'm not sure that's desirable.
I think we'll see other forms in the wild that have a prompt, a colon and a large empty space that should get turned into a field until it runs into the next word or the end of the page.
It's easier to delete a field currently than to manually add one using any of our tools.
For more context, I thought I would check Washington State which has a lot of forms without form fields: https://www.courts.wa.gov/forms/?fa=forms.contribute&formID=6 It looks like many of these forms will use lines to help people handwrite a response.
Going to leave this open to encourage us to track any real forms where recognizing a blank space should turn into form fields.
Just noting a regression. The attached PDF now has no fields recognized at all. I'm not sure that's desirable.
That was noted in https://github.com/SuffolkLITLab/FormFyxer/issues/30#issuecomment-1233615477, before this issue was closed.
For more context, I thought I would check Washington State which has a lot of forms without form fields: https://www.courts.wa.gov/forms/?fa=forms.contribute&formID=6 It looks like many of these forms will use lines to help people handwrite a response.
Looking at those, they all have normal PDF lines; this issue was specifically about lines that extend underneath labels for other fields and for colon-blank space extensions, which I don't see either of in the forms at that link.
I think we'll see other forms in the wild that have a prompt, a colon and a large empty space that should get turned into a field until it runs into the next word or the end of the page.
Have we actually found any in the wild like that though? I'm realizing that we haven't seen any forms like this, likely because it'd be unusable as a printed document, since you can't tell where information is supposed to be written. But I'm still of the opinion this will be pretty hard to get working reliably. and that the effort would be better spent making an easier way to manually add fields to our tools than overestimating the number of fields by a long shot. Also note that the idea that we should over-estimate because it's easier to delete is in contention with https://github.com/SuffolkLITLab/FormFyxer/issues/110, where we're trying to not over-estimate things.
It's probably a good idea at this point to get 2 or 3 forms from each jurisdiction and try to evaluate how our tools do with each way of marking fields, so we have a more comprehensive view. I made #117 for that.
I think in our current workflow, recognizing more fields is better. So maybe a few heuristic tweaks can be added from this test file?
Fax_cover_sheet_no_fields.pdf
turned into this:
fields_file.pdf
The long lines tricked the field recognizer into adding one long field that spanned the whole page. The
:
should be a clue to start a new field.Similarly--the
Comments:
text followed by a big empty space should be a signal to add a form field.BTW, Adobe Acrobat doesn't recognize any fields in this PDF, so our rules might already be smarter.