allenai / mmda

multimodal document analysis
Apache License 2.0
159 stars 18 forks source link

Breaking Change in PdfPlumber Causing Bibentry_detector and bibentry-predictor builds to fail #203

Closed Whattabatt closed 1 year ago

Whattabatt commented 1 year ago

@geli-gel @soldni @kyleclo

The builds are failing with " AttributeError: module 'pdfplumber.utils' has no attribute 'WordExtractor'"

https://s2build.inf.ai2.in:8543/buildConfiguration/SemanticScholar_Contin_Data_TimoServicesBibentryPredictorMmdaCi/2358135?buildTab=log&focusLine=3403&linesState=456

kyleclo commented 1 year ago

@Whattabatt can u point me to how you're doing the pdfplumber version pinning?

Whattabatt commented 1 year ago

@Whattabatt can u point me to how you're doing the pdfplumber version pinning?

My mistake, was confused by a comment Luca made in thread. A code change to pin pdfplumber hasn't been carried out

kyleclo commented 1 year ago

From call:

  1. pdfplumber needs to be pinned to specific version 0.7.4 or 0.7.6. Should get rid of DWP problem. Looking into the code doing the DWP import.
  2. @cmwilhelm found it's in the application code
    /usr/local/lib/python3.8/dist-packages/ai2_internal/bibentry_predictor_mmda/interface.py:14: in <module>
    09:59:15
      from mmda.predictors.hf_predictors.bibentry_predictor.predictor import BibEntryPredictor
    09:59:15
    /usr/local/lib/python3.8/dist-packages/mmda/predictors/__init__.py:5: in <module>
    09:59:15
      from mmda.predictors.heuristic_predictors.dictionary_word_predictor import DictionaryWordPredictor
    09:59:15
    /usr/local/lib/python3.8/dist-packages/mmda/predictors/heuristic_predictors/dictionary_word_predictor.py:32: in <module>
    09:59:15
      from mmda.parsers import PDFPlumberParser
    09:59:15
    /usr/local/lib/python3.8/dist-packages/mmda/parsers/__init__.py:1: in <module>
    09:59:15
      from mmda.parsers.pdfplumber_parser import PDFPlumberParser
    09:59:15
    /usr/local/lib/python3.8/dist-packages/mmda/parsers/pdfplumber_parser.py:19: in <module>
    09:59:15
      class WordExtractorWithFontInfo(ppu.WordExtractor):
    09:59:15
    E   AttributeError: module 'pdfplumber.utils' has no attribute 'WordExtractor'
  3. Add test in MMDA.src that tests all available predictors
Whattabatt commented 1 year ago

Bibentry CI passing after Chris's update. Problem solved