Open twerkmeister opened 2 years ago
Hi Thomas, it is not possible to insert another processor without providing the full set of parameters by providing module path and ProcessorParams-class for each of them. It's quite a mouthful:
--data.pre_proc.processors \
calamari_ocr.ocr.dataset.imageprocessors.scale_to_height_processor:ScaleToHeightProcessorParams \
calamari_ocr.ocr.dataset.imageprocessors.final_preparation:FinalPreparationProcessorParams \
calamari_ocr.ocr.dataset.textprocessors:BidiTextProcessorParams \
calamari_ocr.ocr.dataset.textprocessors:StripTextProcessorParams \
calamari_ocr.ocr.dataset.textprocessors:TextNormalizerProcessorParams \
calamari_ocr.ocr.dataset.textprocessors:TextRegularizerProcessorParams \
calamari_ocr.ocr.dataset.imageprocessors:AugmentationProcessorParams \
calamari_ocr.ocr.dataset.imageprocessors:PrepareSampleProcessorParams
Thank you so much @andbue I think that's the starting point I needed 👍 will try it out tomorrow
This should be way easier to find. The CLI (tfaip's subcommand self-documentation) is not helpful.
Also, this should be the default if passing --train.channels 3
...
Hi @ChWick @andbue!
Thanks for this amazing project. I am using calamari as part of a data extraction task for tables in mid 20th century documents. Specifically I run calamari on the (single line) cells of the tables and had a really satisfying experience so far and made good progress training and tuning on my growing dataset (12500 cell images by now). I've been digging a bit deeper into the documentation and code and one of the experiments I want to try is turning of the center normalizer preprocessor as it seems to be doing a few things that I am not sure are necessary or helpful for my data. For example, image quality at times is already low, so additional blurring might be hurting. Second, my lines aren't skewed. I deal with skewing before overlaying the table grid onto the page.
I saw this paramater in the docs:
and also this code in another issue
What I would like to try is to just exchange the center normalizer preprocessor for the basic scale to height preprocessor. But I am not sure how to achieve this. Do I need to define all the preprocessors and their parameters using
--data.pre_proc.processors
? Would you mind giving me an idea how to reference the classes and their parameters properly? And does my reasoning to exchange the preprocessor have some merit?Best, Thomas