Questions about example for COCO captions captioning task

I encountered a problem when executing this code clip_benchmark eval --dataset=mscoco_captions --dataset_root=/home/xzj/Desktop/CLIP/dataset --task=captioning --model=coca_ViT-L-14 --output=result.json --pretrained /home/xzj/Desktop/CLIP/dataset/train2014/logs/2024_05_15-16_11_56-model_coca_ViT-L-14-lr_1e-05-b_16-j_1-p_amp/checkpoints/epoch_1.pt

The error message is like this