OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.21k stars 439 forks source link

论文无法复现 #242

Open nlpplus opened 7 months ago

nlpplus commented 7 months ago

在Flickr30k-CN数据的fintune实验上,无法复现论文,特别是valid上表现,低于10%

训练配置:

Number of GPUs per GPU worker

GPUS_PER_NODE=8

Number of GPU workers, for single-worker training, please set to 1

WORKER_CNT=1

The ip address of the rank-0 worker, for single-worker training, please set to localhost

export MASTER_ADDR=127.0.0.1

The port for communication

export MASTER_PORT=8514

The rank of this worker, should be in {0, ..., WORKER_CNT-1}, for single-worker training, please set to 0

export RANK=0

context_length=52 warmup=100 batch_size=512 valid_batch_size=512 accum_freq=2 lr=5e-4 wd=0.001 max_epochs=30 # or you can alternatively specify --max-steps valid_step_interval=150 valid_epoch_interval=1 vision_model=ViT-B-16 text_model=RoBERTa-wwm-ext-base-chinese

mask_ratio=0.5 # use flip: set mask ratio

use_augment="--use-augment" use_augment=""

DtYXs commented 4 months ago

您好,可以参考我们的技术报告中给出的超参数配置,大多数finetune实验是在32卡上进行的。看到您finetune实验应该是单机8卡,可以尝试一下经过了单机8卡验证的COCO-CN脚本默认参数进行COCO-CN finetune实验。