ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
https://arxiv.org/abs/2104.08860
MIT License
887 stars 125 forks source link

loss NaN when training on MSRVTT #93

Closed TXH-mercury closed 1 year ago

TXH-mercury commented 1 year ago

image

--do_eval can get correct zero-shot performance but --do_train meets NaN at the start of training, in both 1 card and 4 cards settings. The default parameters are used.

sweet132 commented 1 year ago

Hello, is the problem you said about NaN solved? I also meet the problem, if I modify the data preprocessing can solve this problem, but the accuracy is not as accurate as in the paper

ZhaiYanbo commented 1 month ago

I encountered the same problem as you. How did you resolve it?