SuffolkLITLab / form-explorer

A set of tools for exploring the connections between blank and historic court forms.
https://suffolklitlab.org/form-explorer/
2 stars 0 forks source link

fix the reading level metric #34

Closed colarusso closed 1 year ago

colarusso commented 2 years ago

It's showing numbers that are too large, esp. after making some changes that I thought would help "fix" the issue. Obviously, I missed something.

colarusso commented 2 years ago

I'm not sure it's "fixed," given the limitations of such for texts like these, but it's a little better.

colarusso commented 2 years ago

FWIW, I think we're now underestimating difficulty. Have to decide if we want to over or under estimate. :/

colarusso commented 2 years ago

Should fix error where reading level = 0 to unknown

nonprofittechy commented 2 years ago

NOTE: we want to compare like to like. So removing the instructions page might be important to have scores that can be compared across jurisdictions. E.g., California has long instruction pages--most states don't have that at all.

nonprofittechy commented 1 year ago

I think switching from pikepdf to get the text of the form to pdfminer solved this--pikepdf has a known issue with getting the text stream from a PDF (it assumes a specific encoding that is only coincidentally true for some PDFs)