VamosC / CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Apache License 2.0
90 stars 12 forks source link

Is there a way to detect spaces? #8

Closed morgankohler closed 4 months ago

morgankohler commented 4 months ago

Thank you for the great work and released models. I noticed the tokenizer does not include spaces. Was the model not trained on them or is there a way to add them to the tokenizer?

VamosC commented 4 months ago

So far as I know, space is not included in the common STR benchmarks like ICDAR-13 or ICDAR-15. The code is not trained with spaces. I think you can add a space token into the charset into the config file. https://github.com/VamosC/CLIP4STR/blob/main/configs/charset/94_full.yaml.

STR is basically a classification problem, you can consider space as a new class. However, you need to modify the tokenizer accordingly. https://github.com/VamosC/CLIP4STR/blob/main/strhub/data/utils.py

You can also refer to some other methods like text spotting. https://arxiv.org/pdf/2204.01918.pdf