allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Cap PDFPlumber version #199

Closed kyleclo closed 1 year ago

kyleclo commented 1 year ago

cap pdfplumber version since they made 0.8.0 release which moves Word…Extractor to a submodule.

Can see here it's now under a submodule text: https://github.com/jsvine/pdfplumber/blob/b6847ad4cd4e54e201c0301f66b2c1f3e914cdc0/pdfplumber/utils/text.py

which didn't exist before in 0.7.6 or earlier versions. We were relying on the WordExtractor from: https://github.com/jsvine/pdfplumber/blob/v0.7.6/pdfplumber/utils.py

kyleclo commented 1 year ago

nvm, cancelling PR, @soldni already fixed it https://github.com/allenai/mmda/blob/065e5681f2617a0e0ca6d19e4ea1694a79b42d9a/src/mmda/parsers/pdfplumber_parser.py#L8