Closed colarusso closed 1 year ago
I'm not sure it's "fixed," given the limitations of such for texts like these, but it's a little better.
FWIW, I think we're now underestimating difficulty. Have to decide if we want to over or under estimate. :/
Should fix error where reading level = 0 to unknown
NOTE: we want to compare like to like. So removing the instructions page might be important to have scores that can be compared across jurisdictions. E.g., California has long instruction pages--most states don't have that at all.
I think switching from pikepdf to get the text of the form to pdfminer solved this--pikepdf has a known issue with getting the text stream from a PDF (it assumes a specific encoding that is only coincidentally true for some PDFs)
It's showing numbers that are too large, esp. after making some changes that I thought would help "fix" the issue. Obviously, I missed something.