Closed kba closed 11 months ago
It looks like CI has problems with ocr-fileformat, maybe because of stricter tests.
Yes, the problem is in textract2page. cc @rue-a.
textract2page$ pip install .
Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/
Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [212 lines of output]
/tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored.
!!
********************************************************************************
The following seems to be defined outside of `pyproject.toml`:
`description = 'Convert AWS Textract JSON to PRImA PAGE XML'`
According to the spec (see the link below), however, setuptools CANNOT
consider this value unless `description` is listed as `dynamic`.
https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
To prevent this problem, you can list `description` under `dynamic` or alternatively
remove the `[project]` table from your file and rely entirely on other means of
configuration.
********************************************************************************
!!
_handle_missing_dynamic(dist, project_table)
/tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored.
!!
[...]
Yes, the problem is in textract2page. cc @rue-a.
textract2page$ pip install . Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/ Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error × Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [212 lines of output] /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored. !! ******************************************************************************** The following seems to be defined outside of `pyproject.toml`: `description = 'Convert AWS Textract JSON to PRImA PAGE XML'` According to the spec (see the link below), however, setuptools CANNOT consider this value unless `description` is listed as `dynamic`. https://packaging.python.org/en/latest/specifications/declaring-project-metadata/ To prevent this problem, you can list `description` under `dynamic` or alternatively remove the `[project]` table from your file and rely entirely on other means of configuration. ******************************************************************************** !! _handle_missing_dynamic(dist, project_table) /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored. !! [...]
Yeah, and I can reproduce locally, will preparare a PR after tech call
See https://github.com/slub/textract2page/pull/13 for a hackish fix.
See slub/textract2page#13 for a hackish fix.
Now updating ocrd_fileformat to include https://github.com/UB-Mannheim/ocr-fileformat/pull/171 which in turn includes https://github.com/slub/textract2page/pull/13 to test the CI.
Updates core to v2.59.1 which includes the workflow endpoint, additional features for chunking and additional output formats for
ocrd workspace list-page
; fixing the file naming in the bagger; and the filtering by file group forclone
,zip bag
etc.@stweil improved the
page2img
script in format-converters significantly.@mikegerber did some house cleaning work on dinglehopper and ocrd_calamari
ocrd_pagetopdf should now work properly on MacOS and supports the METS Server.
workflow-configuration contains additional XSLT to detect ID clashes and add missing confidence values, supports pretty printing XML in the CLIs and supports the METS Server.
tesseract is also updated to the latest state in master.
I will merge this tomorrow, let me know if I missed something.I forgot to click on "Create pull request". Will merge ASAP once the CI is fixed.