Changes in PDFPlumberParser behavior since the
citations mentions model was last updated had caused
its integration tests to fail
This changeset does two things:
1) Changes the test assertions for citation mentions' TIMO
integration tests to be text value based, rather than span position
based
2) Adds a new test to MMDA's suite that verifies PDFPlumberParser
stability -- will alert when our extracted text, tokenization, or bboxes
change at this layer, as a signal to reevaluate the rest of the DAG or
revert changes.
RE: https://github.com/allenai/scholar/issues/36386#issuecomment-1516825407
Changes in PDFPlumberParser behavior since the citations mentions model was last updated had caused its integration tests to fail
This changeset does two things:
1) Changes the test assertions for citation mentions' TIMO integration tests to be text value based, rather than span position based 2) Adds a new test to MMDA's suite that verifies PDFPlumberParser stability -- will alert when our extracted text, tokenization, or bboxes change at this layer, as a signal to reevaluate the rest of the DAG or revert changes.