OCR-D / ocrd_froc

Apache License 2.0
7 stars 2 forks source link

run with font class priors #7

Open bertsky opened 1 year ago

bertsky commented 1 year ago

It would be really nice if it was possible to constrain the font predictions to classes known in advance. This could be implemented in the OCR-D wrapper by suppressing certain results from the prediction, but ideally its passed to the neural network decoder so all the probability mass gets reassigned.

For example, if I know the document only contains Fraktur and Antiqua, or Hebrew and Greek, or Antiqua and Italic and Manuscript, or Gotico-Antiqua and Schwabacher, then I don't want to risk "surprise" outliers (or systematic misclassification as in the Greek-Italic example).

GemCarr commented 1 year ago

This option can be easily added for the font classifier which could improve performance and will ensure these class are never in the predictions, so i will do that. For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

bertsky commented 1 year ago

For COCR unfortunately there is no easy way to add it for now, as we don't have dedicated parts of the model for specific fonts that we could 'turn off'.

Yes, I guess that would require changing the network of COCR, with an input-as-output scheme (i.e. representing the font as additional output dimension).

GemCarr commented 1 year ago

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc. Maybe the structure can be modified to ignore specific classes at a later point, i will investigate.

bertsky commented 1 year ago

Its also linked to the training process of the model, at the beginning we have dedicated modules specialized on specific fonts but then the whole network is fine tuned at once. The result is kind of an interlinked structure where these modules are not specialized on specific fonts anymore, but most probably mix of fonts etc.

Oh, interesting. I do think this would still be compatible with an input-as-output extension. The network would simply (be forced to) learn to factor this in at every phase (perhaps with some custom regularizer).

Or you just add it as another (uninitialized) layer during the finetuning phase.

seuretm commented 1 year ago

For now, we have tried once to modify the COCR architecture to also output font groups at character level, which partially (but not fully) enforced the different components to be specialized for different font groups. It had unfortunately a negative impact on the CER. Investigating this further is in our todo list, however it will require time, as training combined OCR models isn't that fast.