HumamAlwassel / TSP

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks (ICCVW 2021)
http://humamalwassel.com/publication/tsp/
MIT License
107 stars 16 forks source link

The LOSS value is too large and does not decrease #17

Closed ZChengLong578 closed 2 years ago

ZChengLong578 commented 2 years ago

Hi, @HumamAlwassel, I'm sorry to bother you again. I did it without or very little background (no action). Now I have added more background (no Action), but the LOSS value is very large and does not decrease. The specific situation is shown in the following figure: 3ed8aa4893a75580fc15295ef5acb27 Here are the files for the training set and validation set: 90dbeb733f39c8a64cecf13b03542ba What can I do to solve this problem?

HumamAlwassel commented 2 years ago

Hi @ZChengLong578,

Unfortunately, I cannot really tell what the issue is from the pictures you attached. However, I noticed that you are using clip_length=4 with frame_rate=15 fps. A clip of only 4 frames is too short to tell you anything about the content of the video. My guess is that your network is just having a hard time learning because clips are too short and might look very similar. My suggestion is to increase the clip_length to 16 (just like we did in TSP). However, you might need to increase the frame_rate to 30 fps if you have too many annotations less than 16/15~=1.06 seconds.

Hope this helps, Humam

ZChengLong578 commented 2 years ago

Hi, @HumamAlwassel,

Thank you for your reply. I tried to set clip_length to 16, but still did not solve the problem of excessive loss value. I have now removed the background (no Action) part. Now I only add the action part to the training, and setting the clip_length to 16 greatly improves the accuracy of the model. However, since my video is short, when the frame_rate is 15, nearly 35% of the action videos are removed, so I try to set the frame_rate to 30. But it's a little bit worse than it was at 15. Since the length of my action videos is mostly integer, more than 96% of the videos can be added to training if the frame_rate is set to 16. But I noticed that in the data preparation phase the video was processed at 30 fps, which is exactly double 15. If I want to set frame_rate to 16, do I need to set the frame_rate to 32 fps in the preparation phase?

HumamAlwassel commented 2 years ago

Hi @ZChengLong578,

No need to change the video data preparation phase. The training code will do the sampling according to the frame_rate you specify as input.