OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
72 stars 17 forks source link

:package: v2023-12-07 #400

Closed kba closed 12 months ago

kba commented 12 months ago

Updates core to v2.59.1 which includes the workflow endpoint, additional features for chunking and additional output formats for ocrd workspace list-page; fixing the file naming in the bagger; and the filtering by file group for clone, zip bag etc.

@stweil improved the page2img script in format-converters significantly.

@mikegerber did some house cleaning work on dinglehopper and ocrd_calamari

ocrd_pagetopdf should now work properly on MacOS and supports the METS Server.

workflow-configuration contains additional XSLT to detect ID clashes and add missing confidence values, supports pretty printing XML in the CLIs and supports the METS Server.

tesseract is also updated to the latest state in master.

I will merge this tomorrow, let me know if I missed something. I forgot to click on "Create pull request". Will merge ASAP once the CI is fixed.

stweil commented 12 months ago

It looks like CI has problems with ocr-fileformat, maybe because of stricter tests.

stweil commented 12 months ago

Yes, the problem is in textract2page. cc @rue-a.

textract2page$ pip install .
Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/
Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [212 lines of output]
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored.
      !!

              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:

              `description = 'Convert AWS Textract JSON to PRImA PAGE XML'`

              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `description` is listed as `dynamic`.

              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/

              To prevent this problem, you can list `description` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************

      !!
        _handle_missing_dynamic(dist, project_table)
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored.
      !!
[...]      
kba commented 12 months ago

Yes, the problem is in textract2page. cc @rue-a.

textract2page$ pip install .
Looking in indexes: https://pypi.org/simple, https://code.bib.uni-mannheim.de/api/packages/stweil/pypi/simple/
Processing /UB-Mannheim/ocr-fileformat/vendor/textract2page
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [212 lines of output]
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `description` defined outside of `pyproject.toml` is ignored.
      !!

              ********************************************************************************
              The following seems to be defined outside of `pyproject.toml`:

              `description = 'Convert AWS Textract JSON to PRImA PAGE XML'`

              According to the spec (see the link below), however, setuptools CANNOT
              consider this value unless `description` is listed as `dynamic`.

              https://packaging.python.org/en/latest/specifications/declaring-project-metadata/

              To prevent this problem, you can list `description` under `dynamic` or alternatively
              remove the `[project]` table from your file and rely entirely on other means of
              configuration.
              ********************************************************************************

      !!
        _handle_missing_dynamic(dist, project_table)
      /tmp/pip-build-env-rrv2e39h/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:75: _MissingDynamic: `readme` defined outside of `pyproject.toml` is ignored.
      !!
[...]      

Yeah, and I can reproduce locally, will preparare a PR after tech call

stweil commented 12 months ago

See https://github.com/slub/textract2page/pull/13 for a hackish fix.

kba commented 12 months ago

See slub/textract2page#13 for a hackish fix.

Now updating ocrd_fileformat to include https://github.com/UB-Mannheim/ocr-fileformat/pull/171 which in turn includes https://github.com/slub/textract2page/pull/13 to test the CI.