VamosC / CLIP4STR

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Apache License 2.0
90 stars 12 forks source link

Could training Parseq directly on image data containing text content sampled from the 400 million images used to train CLIP potentially yield better results? #16

Closed Apostatee closed 1 month ago