Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
Apache License 2.0
1.04k stars 209 forks source link

network topology at CNN-RNN interface #353

Open bertsky opened 7 months ago

bertsky commented 7 months ago

Calamari's network specs do not contain or require a reshaping/projection operation before the first LSTM layer, this seems to be added automatically.

However, other traditional CNN-RNN implementations offer an alternative element: an LSTM which takes the height axis as sequence and summarises into a single output vector per width position:

Is it perhaps expected that the combination of reshape and CenterNormalizer will do a better job? I wonder whether this has ever been thoroughly investigated. Also, CenterNormalizer might degrade instead of improve horizontal statistics, esp. for handwriting (where some have even argued a need for deslanting), or with grayscale input.