Closed vfbsilva closed 4 years ago
Calamari is an ATR engine only. Its input is a text line that must be segmented in a previous step. Example files that can be used for training/prediction are located here: https://github.com/Calamari-OCR/calamari/tree/master/calamari_ocr/test/data The most simple way is to use pairs of line images and text files (e.g. https://github.com/Calamari-OCR/calamari/tree/master/calamari_ocr/test/data/uw3_50lines/train)
Does the background of input text has to be white?
In all of our use-cases the background was white, but in general the color could be arbitrary. However, I expect that a significantly higher amount of GT is required. Therefore, I recommend to binarize your input which should be straightforward on your ID cards: Grayscale -> Otsu should suffice.
Where can I find some samples or tutorials about how to create the training files? I want to use calamari to recover data from ID cards as the attached image. Is it feasible?