ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
https://arxiv.org/abs/2104.08860
MIT License
851 stars 121 forks source link

dataset splits and training code #32

Closed zhedasuiyuan closed 2 years ago

zhedasuiyuan commented 2 years ago

Hi,

First, I would like to say thank for the nice work and the code. I have a few questions on the dataset splits. For example, in MSVD and DiDeMo, they have both validation set and test set. But as in the code for the dataloader, it seems that you are using the test set also for validation.

Moreover, in the training code, Line 548, you used test set to select the best checkpoint, which may not be the best practice especially when there is a validation set available for the dataset. What do you think? Thanks!

DATALOADER_DICT["msvd"] = {"train":dataloader_msvd_train, "val":dataloader_msvd_test, "test":dataloader_msvd_test}
DATALOADER_DICT["lsmdc"] = {"train":dataloader_lsmdc_train, "val":dataloader_lsmdc_test, "test":dataloader_lsmdc_test}
DATALOADER_DICT["didemo"] = {"train":dataloader_didemo_train, "val":dataloader_didemo_test, "test":dataloader_didemo_test}
ArrowLuo commented 2 years ago

Hi @zhedasuiyuan,

First, the validation dataloader and test dataloader have the same load function. So a parameter subset="val" is used to choose what dataset you used. See Line #L494 and Line #L497. The dataloaders of validation and test have different calling interfaces with different parameters.

You are free to uncomment #L546 and replace R1 to choice hyperparameters on the validation set. We commented on this line in the released version considering it's time-consuming for the first running.

Thanks.

zhedasuiyuan commented 2 years ago

Thanks for the reply. Just to confirm that, for the results on the paper, the best checkpoints are chosen based on the validation set. Is that correct?

ArrowLuo commented 2 years ago

Yes, of course, if both the validation set and test set are available.