OCR-D / ocrd_kraken

Wrapper for the kraken OCR engine
Apache License 2.0
11 stars 6 forks source link

Fix recognize coords #38

Closed bertsky closed 1 year ago

bertsky commented 1 year ago

attempts to fix #36

bertsky commented 1 year ago

BTW, I am also thinking about replacing … https://github.com/OCR-D/ocrd_kraken/blob/802c6b0b76a3e75070c680aa3b19d36142decf4e/ocrd_kraken/recognize.py#L63 … (i.e. the hard requirement to use binarized images if the model metadata expect that) with something more flexible.

But to date we have no general mechanism for this in OCR-D, besides some per-processor explicit feature_filter / feature_selector params (but not for Calamari or Tesseract).

kba commented 1 year ago

BTW, I am also thinking about replacing …

https://github.com/OCR-D/ocrd_kraken/blob/802c6b0b76a3e75070c680aa3b19d36142decf4e/ocrd_kraken/recognize.py#L63

… (i.e. the hard requirement to use binarized images if the model metadata expect that) with something more flexible. But to date we have no general mechanism for this in OCR-D, besides some per-processor explicit feature_filter / feature_selector params (but not for Calamari or Tesseract).

What would you replace this mechanism with?

bertsky commented 1 year ago

What would you replace this mechanism with?

Not sure. Perhaps the mechanism just needs to be enforced during training. (In my case, I was using a model typewriter_best.mlmodel from UBMa which identified itself as one-channel but performed much better on grayscale, so maybe it was in fact not trained on binarised images, at least not exclusively. Only @stweil would know...)

stweil commented 1 year ago

We typically use grayscale or even colour images in our model trainings, for example http://idb.ub.uni-tuebingen.de/opendigi/walz_1976, but the training for the base models also consumed some binarized images.

stweil commented 1 year ago

If a model contains wrong metadata like claiming that it requires a binarized image, my preferred solution would be fixing that metadata.

kba commented 1 year ago

If a model contains wrong metadata like claiming that it requires a binarized image, my preferred solution would be fixing that metadata.

That would be best of course. Consistent and correct model metadata on channels will become more important, the more we move away from purely bitonal models.

bertsky commented 1 year ago

Ok, I have checked with the Kraken code base. Metadata should work:

  1. https://github.com/mittagessen/kraken/blob/b48145dad2de326af0c1d3c837d0f823d2331412/kraken/lib/train.py#L169-L178 sets the model metadata from the mode of the training dataset
  2. https://github.com/mittagessen/kraken/blob/b48145dad2de326af0c1d3c837d0f823d2331412/kraken/lib/dataset/recognition.py#L321 uses 1 by default
  3. https://github.com/mittagessen/kraken/blob/b48145dad2de326af0c1d3c837d0f823d2331412/kraken/lib/dataset/recognition.py#L394-L418 promotes the im_mode of the whole dataset if it encounters an input sample with "more" channels

However, this still could mean that you had RGB images somewhere, which in (1), for whatever reason, does not get picked up for the model metadata itself. (I will open an issue for this.)

bertsky commented 1 year ago

(I will open an issue for this.)

https://github.com/mittagessen/kraken/issues/522

stweil commented 1 year ago

I just checked our models with my script mlmodel.py. german_handwriting and german_print have one_channel_mode = 'L', all others have currently `one_channel_mode = '1'.

The script can be extended to not only dump or fix the metadata, but also to modify existing values or add new ones.