TongkunGuan / CCD

[ICCV2023] Self-supervised Character-to-Character Distillation for Text Recognition
https://openaccess.thecvf.com/content/ICCV2023/papers/Guan_Self-Supervised_Character-to-Character_Distillation_for_Text_Recognition_ICCV_2023_paper.pdf
140 stars 7 forks source link
scene-text-detection-recognition scene-text-recognition scene-text-segmentation super-resolution

Self-supervised Character-to-Character Distillation for Text Recognition (ICCV23)

This is the official code of "Self-supervised Character-to-Character Distillation for Text Recognition". For more details, please refer to our paper or 中文解读 or poster. If you have any questions please contact me by email (gtk0615@sjtu.edu.cn).

We also released CVPR23 work on scene text recognition:

Pipeline

Model architecture

examples

Environments

# 3090 Ubuntu 16.04 Cuda 11
conda create -n CCD python==3.7
source activate CCD
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
# The following optional dependencies are necessary
please refer to requirement.txt
pip install tensorboard==1.15.0
pip install tensorboardX==2.2

Pretrain

# We recommend using multiple 3090s for training.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --config ./Dino/configs/CCD_pretrain_ViT_xxx.yaml

Finetune

#update model.pretrain_checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0,1 python train_finetune.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Test

#update model.checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0 python test.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Weights

Mask preparation

Citation

If you find our method useful for your reserach, please cite
@InProceedings{Guan_2023_ICCV,
    author    = {Guan, Tongkun and Shen, Wei and Yang, Xue and Feng, Qi and Jiang, Zekun and Yang, Xiaokang},
    title     = {Self-Supervised Character-to-Character Distillation for Text Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19473-19484}
}

License

- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.