alephdata / ingest-file

Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
GNU Affero General Public License v3.0
53 stars 25 forks source link

Bump pymupdf from 1.21.1 to 1.24.0 #604

Closed dependabot[bot] closed 2 months ago

dependabot[bot] commented 3 months ago

Bumps pymupdf from 1.21.1 to 1.24.0.

Release notes

Sourced from pymupdf's releases.

PyMuPDF-1.24.0 released

PyMuPDF-1.24.0 has been released.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:

python -m pip install --upgrade pymupdf

[Linux-aarch64 wheels will be built and uploaded later.]

Changes in version 1.24.0 (2024-03-21)

  • Fixed issues:

  • Other:

    • Use MuPDF-1.24.0.

    • Add support for redacting vector graphics.

    • Several fixes for table module

      • Add new method for outputting the table as a markdown string.

      • Address errors in computing the table header object:

        We now allow None as the cell value, because this will be resolved where needed (e.g. in the pandas DataFrame).

        We previously tried to enforce rect-like tuples in all header cell bboxes, however this fails for tables with all-None columns. This fix enables this and constructs an empty string in the corresponding cell string.

        We now correctly include start / stop points of lines in the bbox of the clustered graphic. We previously joined the line's rectangle - which had no effect because this is always empty.

... (truncated)

Changelog

Sourced from pymupdf's changelog.

Change Log

Changes in version 1.24.0 (2024-03-21)

  • Fixed issues:

    • Fixed 3281 <https://github.com/pymupdf/PyMuPDF/issues/3281>_: Preparing metadata (pyproject.toml) did not run successfully
    • Fixed 3279 <https://github.com/pymupdf/PyMuPDF/issues/3279>_: PyMuPDF no longer builds in Alpine Linux
    • Fixed 3257 <https://github.com/pymupdf/PyMuPDF/issues/3257>_: apply_redactions() deleting text outside of annoted box
    • Fixed 3216 <https://github.com/pymupdf/PyMuPDF/issues/3216>_: AttributeError: 'Annot' object has no attribute 'del'
    • Fixed 3207 <https://github.com/pymupdf/PyMuPDF/issues/3207>_: get_drawings's items is missing line from h path operator
    • Fixed 3201 <https://github.com/pymupdf/PyMuPDF/issues/3201>_: Memory leaks when merging PDFs
    • Fixed 3197 <https://github.com/pymupdf/PyMuPDF/issues/3197>_: page.get_text() returns hexadecimal text for some characters
    • Fixed 3196 <https://github.com/pymupdf/PyMuPDF/issues/3196>_: Remove text not working in 1.23.25 version vs 1.20.2
    • Fixed 3172 <https://github.com/pymupdf/PyMuPDF/issues/3172>_: PDF's 45º lines dissapearing in png conversion
    • Fixed 3135 <https://github.com/pymupdf/PyMuPDF/issues/3135>_: Do not log warnings to stdout
    • Fixed 3125 <https://github.com/pymupdf/PyMuPDF/issues/3125>_: get_pixmap method stuck on one page and runs forever
    • Fixed 2964 <https://github.com/pymupdf/PyMuPDF/issues/2964>_: There is an issue with the image generated by the page.get_pixmap() function
  • Other:

    • Use MuPDF-1.24.0.

    • Add support for redacting vector graphics.

    • Several fixes for table module

      • Add new method for outputting the table as a markdown string.

      • Address errors in computing the table header object:

        We now allow None as the cell value, because this will be resolved where needed (e.g. in the pandas DataFrame).

        We previously tried to enforce rect-like tuples in all header cell bboxes, however this fails for tables with all-None columns. This fix enables this and constructs an empty string in the corresponding cell string.

        We now correctly include start / stop points of lines in the bbox of the clustered graphic. We previously joined the line's rectangle - which had no effect because this is always empty.

    • Improved exception text if we fail to open document.

    • Fixed build with new libclang 18.

Changes in version 1.23.26 (2024-02-29)

  • Fixed issues:

... (truncated)

Commits
  • 1118f02 Update changelog, version numbers and release dates for release 1.24.0.
  • c724fb4 Several fixes for table module
  • 7e9c85b Fixes for clang 18 and new mupdf release branch 1.24.x.
  • f039ad4 tests/test_general.py:test_open(): workaround swig bug on openbsd.
  • 517ecfa pyproject.toml: typo.
  • 36b1e5c setup.py: cope with new libclang 18, which breaks on macos/arm64.
  • 0ba51d3 typo het_toc instead of get_toc
  • 96d3171 @​JoKalliauer has signed the CLA from Pull Request #3275
  • 86b04f9 tests/test_general.py:test_open(): fix failures on windows and linux sysinstall.
  • 06031b3 src/ tests/: improved exception text if we fail to open document.
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 2 months ago

Superseded by #616.