Closed bertsky closed 1 year ago
BTW, I am also thinking about replacing … https://github.com/OCR-D/ocrd_kraken/blob/802c6b0b76a3e75070c680aa3b19d36142decf4e/ocrd_kraken/recognize.py#L63 … (i.e. the hard requirement to use binarized images if the model metadata expect that) with something more flexible.
But to date we have no general mechanism for this in OCR-D, besides some per-processor explicit feature_filter
/ feature_selector
params (but not for Calamari or Tesseract).
BTW, I am also thinking about replacing …
… (i.e. the hard requirement to use binarized images if the model metadata expect that) with something more flexible. But to date we have no general mechanism for this in OCR-D, besides some per-processor explicit
feature_filter
/feature_selector
params (but not for Calamari or Tesseract).
What would you replace this mechanism with?
What would you replace this mechanism with?
Not sure. Perhaps the mechanism just needs to be enforced during training. (In my case, I was using a model typewriter_best.mlmodel
from UBMa which identified itself as one-channel but performed much better on grayscale, so maybe it was in fact not trained on binarised images, at least not exclusively. Only @stweil would know...)
We typically use grayscale or even colour images in our model trainings, for example http://idb.ub.uni-tuebingen.de/opendigi/walz_1976, but the training for the base models also consumed some binarized images.
If a model contains wrong metadata like claiming that it requires a binarized image, my preferred solution would be fixing that metadata.
If a model contains wrong metadata like claiming that it requires a binarized image, my preferred solution would be fixing that metadata.
That would be best of course. Consistent and correct model metadata on channels will become more important, the more we move away from purely bitonal models.
Ok, I have checked with the Kraken code base. Metadata should work:
1
by defaultim_mode
of the whole dataset if it encounters an input sample with "more" channelsHowever, this still could mean that you had RGB images somewhere, which in (1), for whatever reason, does not get picked up for the model metadata itself. (I will open an issue for this.)
(I will open an issue for this.)
I just checked our models with my script mlmodel.py. german_handwriting
and german_print
have one_channel_mode = 'L'
, all others have currently `one_channel_mode = '1'.
The script can be extended to not only dump or fix the metadata, but also to modify existing values or add new ones.
attempts to fix #36