An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
115
stars
14
forks
source link
Upload CLIP4STR Pre-trained on DataComp-1B, LAION-2B, and DFN-5B #11
Closed
mzhaoshuai closed 5 months ago