Closed ZChengLong578 closed 2 years ago
Hi @ZChengLong578,
Unfortunately, I cannot really tell what the issue is from the pictures you attached.
However, I noticed that you are using clip_length
=4 with frame_rate
=15 fps. A clip of only 4 frames is too short to tell you anything about the content of the video. My guess is that your network is just having a hard time learning because clips are too short and might look very similar. My suggestion is to increase the clip_length
to 16 (just like we did in TSP). However, you might need to increase the frame_rate
to 30 fps if you have too many annotations less than 16/15~=1.06 seconds.
Hope this helps, Humam
Hi, @HumamAlwassel,
Thank you for your reply. I tried to set clip_length to 16, but still did not solve the problem of excessive loss value. I have now removed the background (no Action) part. Now I only add the action part to the training, and setting the clip_length to 16 greatly improves the accuracy of the model. However, since my video is short, when the frame_rate is 15, nearly 35% of the action videos are removed, so I try to set the frame_rate to 30. But it's a little bit worse than it was at 15. Since the length of my action videos is mostly integer, more than 96% of the videos can be added to training if the frame_rate is set to 16. But I noticed that in the data preparation phase the video was processed at 30 fps, which is exactly double 15. If I want to set frame_rate to 16, do I need to set the frame_rate to 32 fps in the preparation phase?
Hi @ZChengLong578,
No need to change the video data preparation phase. The training code will do the sampling according to the frame_rate
you specify as input.
Hi, @HumamAlwassel, I'm sorry to bother you again. I did it without or very little background (no action). Now I have added more background (no Action), but the LOSS value is very large and does not decrease. The specific situation is shown in the following figure: Here are the files for the training set and validation set: What can I do to solve this problem?