ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
https://arxiv.org/abs/2104.08860
MIT License
888 stars 125 forks source link

NaN loss when ft on MSR_VTT #59

Closed mboboGO closed 2 years ago

mboboGO commented 2 years ago

When I directly finetune CLIP4CLIP on msrvtt, I get NaN loss after about 100 iters.
After checking the log, the reason may be that some missed videos in msr-vtt produce many zero input tensors. Thus, I change the msrvtt-dataloader to skip those missed videos by: image , and the training become correct.

ArrowLuo commented 2 years ago

Hi @mboboGO, Thanks for your awesome sharing. It is a good approach to filter videos in the training phase.

mboboGO commented 2 years ago

Hi, I further find some videos exists but cannot be well-read, which should be also removed. After skipping all missed and damaged videos, the ft (just on epoch) results on msr-vtt become image , which looks good.

liuyongjie985 commented 2 years ago

Good idea to remove error video

ForawardStar commented 2 years ago

Hi, I further find some videos exists but cannot be well-read, which should be also removed. After skipping all missed and damaged videos, the ft (just on epoch) results on msr-vtt become image , which looks good. Hello, Very good idea. Here I wonder is your result obtained by 'meanP' or 'seqTransf' ?