loss becomes nan - Githubissues

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

https://arxiv.org/abs/2104.08860

MIT License

880 stars 124 forks source link

loss becomes nan #85

Closed fake-warrior8 closed 1 year ago

fake-warrior8 commented 2 years ago

Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like this issue. However, I found that the RawVideoExtractorCV2 function succeeded in reading the video when testing only one video input (Directly test the video_to_tensor function in RawVideoExtractorCV2 ), but failed to read the video with multiple num_workers when running the given scripts (the print log is printed in line 63 by myself, but no thing will be printed in line 211 ). Is there something wrong with the multiple threads setting?

ArrowLuo commented 2 years ago

Hi @fake-warrior8, sorry for the delayed reply. There is nothing wrong when I ran with multiple works, so I have no idea about this problem. I wonder if it is limited by the I/O speed when using multiple threads (not sure). Best~

JackBaron-s commented 1 year ago

Hi, I also ran the code on the MSRVTT dataset with 2 GeForce 3090 and set the numworks into 8. However, the loss becomes Nan and here is an error from the dataloader_msrvtt_retrieval.py line 286 video path: {} error. video id: {}".format(video_path, video_id).

It is weird that everything is fine when I ran without multi threads.

JackBaron-s commented 1 year ago

@ArrowLuo I solved the problem by processing the video datasets at first. The code is available in compress_video.py U can try it.

LinB203 commented 1 year ago

@fake-warrior8 i use the torch1.7.1 and it works well. but if i use other version, eg, torch1.10, reading video will return a error. the command is as follows.

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

fake-warrior8 commented 1 year ago

@fake-warrior8 i use the torch1.7.1 and it works well. but if i use other version, eg, torch1.10, reading video will return a error. the command is as follows.

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Thank you for your advice, I used another code and it works well.