OCR-D / ocrd_calamari

Recognize text using Calamari OCR and the OCR-D framework
Apache License 2.0
13 stars 6 forks source link

DeprecationWarning: ID is deprecated; use file_id etc. #109

Closed mikegerber closed 2 months ago

mikegerber commented 10 months ago
  /home/b-mg106/.pyenv/versions/3.11.6/envs/ocrd_calamari/lib/python3.11/site-packages/ocrd_utils/deprecate.py:29: DeprecationWarning: ID is deprecated; use file_id
    warnings.warn('{} is deprecated; use {}'.format(alias, new),
[...]
/home/b-mg106/.pyenv/versions/3.11.6/envs/ocrd_calamari/lib/python3.11/site-packages/ocrd_utils/deprecate.py:29: DeprecationWarning: pageId is deprecated; use page_id
    warnings.warn('{} is deprecated; use {}'.format(alias, new),

I'm a bit confused where exactly I should be using the other names, API docs still has pageId etc., grepping through the code I see the new ones only in the METS server code.


mikegerber commented 10 months ago

Not easy to debug because the warnings don't report the location in my code (I think). Things to try:

[list moved upstairs]

mikegerber commented 10 months ago

This does the trick:

pytest -k test_word_segmentation -W 'ignore:Call to deprecated create function' -W 'error:pageId is deprecated' -W 'error:ID is deprecated'

The first ignore is just to filter out stuff from upstream calamari (different issue), the other options cause the warnings to be treated as errors. pytest output is still hard to decipher but at least it gives the correct position in the own code.

This is with pytest, but the Python interpreter takes the same options (untested).

mikegerber commented 10 months ago

Need to talk to @joschrew about the difference between the names in ocrd vs ocrd_models (commit https://github.com/OCR-D/core/commit/b252d5612771daa8eafb40323c3f6980db9c9c28). The API docs of Workspace.add_file point to the docs of ocrd_models (https://ocr-d.de/core/api/ocrd_models/ocrd_models.ocrd_mets.html#ocrd_models.ocrd_mets.OcrdMets.add_file) - with the old names ☺ Hard or even impossible to do right from the documentation alone.

joschrew commented 10 months ago

Hi. The commit you are mentioning was part of a PullRequest I worked on. Ideas / Issue is here. I think the main idea behind it is pageId (lowerCamelCase names) for ocrd_models and page_id (snake_case names) for ocrd API and CLI. Before the PR, it was mixed inside ocrd API and CLI. Hopefully this helps, please reach out if you want to talk or something.

mikegerber commented 10 months ago

I need to have a look at it again with this information! I think it's at least a bug in the API docs now.

mikegerber commented 2 months ago

Should be fixed in f0c7e82