Self-supervised Character-to-Character Distillation for Text Recognition （ICCV23）

This is the official code of "Self-supervised Character-to-Character Distillation for Text Recognition". For more details, please refer to our paper or 中文解读 or poster. If you have any questions please contact me by email (gtk0615@sjtu.edu.cn).

We also released CVPR23 work on scene text recognition:

Self-Supervised Implicit Glyph Attention for Text Recognition (SIGA) Paper and Code

Pipeline

Model architecture

examples

Environments

# 3090 Ubuntu 16.04 Cuda 11
conda create -n CCD python==3.7
source activate CCD
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
# The following optional dependencies are necessary
please refer to requirement.txt
pip install tensorboard==1.15.0
pip install tensorboardX==2.2

Pretrain

# We recommend using multiple 3090s for training.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --config ./Dino/configs/CCD_pretrain_ViT_xxx.yaml

Finetune

#update model.pretrain_checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0,1 python train_finetune.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Test

#update model.checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0 python test.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Weights

Pretrain: CCD-ViT-Small,3090 Finetune: ARD,3090 and STD,3090

pretrain: CCD-ViT-Base,3090 Finetune:ARD,v100 and STD,3090

Data (please refer to DiG)

OCR-CC Link:

链接：https://pan.baidu.com/s/1PW7ef17AkwG27H_RaP0pDQ 
提取码：C2CD

data_lmdb
├── charset_36.txt
├── Mask
├── TextSeg
├── Super_Resolution
├── training
│   ├── label
│   │   └── synth
│   │       ├── MJ
│   │       │   ├── MJ_train
│   │       │   ├── MJ_valid
│   │       │   └── MJ_test
│   │       └── ST
│   │── URD
│   │    └── OCR-CC
│   ├── ARD
│   │   ├── Openimages
│   │   │   ├── train_1
│   │   │   ├── train_2
│   │   │   ├── train_5
│   │   │   ├── train_f
│   │   │   └── validation
│   │   └── TextOCR 
├── validation
│   ├── 1.SVT
│   ├── 2.IIIT
│   ├── 3.IC13
│   ├── 4.IC15
│   ├── 5.COCO
│   ├── 6.RCTW17
│   ├── 7.Uber
│   ├── 8.ArT
│   ├── 9.LSVT
│   ├── 10.MLT19
│   └── 11.ReCTS
└── evaluation
    └── benchmark
        ├── SVT
        ├── IIIT5k_3000
        ├── IC13_1015
        ├── IC15_2077
        ├── SVTP
        ├── CUTE80
        ├── COCOText
        ├── CTW
        ├── TotalText
        ├── HOST
        ├── WOST
        ├── MPSC
        └── WordArt

Mask preparation

optional, kmeans results of Synth and URD

if you don't want to generate mask, you can generate mask results online. please rewrite code1 and code2

cd ./mask_create
run generate_mask.py #parallelly process mask --> lmdb file
run merge.py #merge multiple lmdb files into single file

Citation

If you find our method useful for your reserach, please cite
@InProceedings{Guan_2023_ICCV,
    author    = {Guan, Tongkun and Shen, Wei and Yang, Xue and Feng, Qi and Jiang, Zekun and Yang, Xiaokang},
    title     = {Self-Supervised Character-to-Character Distillation for Text Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19473-19484}
}

License

- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.

TongkunGuan / CCD

readme