SuffolkLITLab / FormFyxer

A tool for learning about and pre-processing forms
MIT License
11 stars 1 forks source link

get_existing_pdf_fields_with_context #79

Open BryceStevenWilley opened 1 year ago

BryceStevenWilley commented 1 year ago

Taking out an unfinished function until it can actually be finished. The idea is that it should return each field in the PDF, with all of the text surrounding it.

def get_existing_pdf_fields_with_context(
    in_file: Union[str, Path, BinaryIO]
) -> Iterable:
    in_pdf = Pdf.open(in_file)
    text_in_pdf = get_textboxes_in_pdf(in_file)
    fields = [
        {"type": field.FT, "var_name": str(field.T), "all": field}
        for field in iter(in_pdf.Root.AcroForm.Fields)
    ]
    return fields