Closed SonNguyen2510 closed 1 year ago
Thanks for your questions. For 1, the default learning rate in the code is for faster training. And for better convergence please use the learning rate in paper. For 2, the model in paper uses the lean one (It is mentioned in paper, maybe not so clear). For 3, these hardcode params is actually not very important hyper parameters as it is set to save storage of weight files. But in experiment, yes. For 4, yep, we use 5-fold cross validation. We train the model 5 times, and log the best val accuracy for each sub-experiment.
Thank you very much for the quick answer! And I have 1 more question, in the training phase, for each video sample, you temporally random slice 16 adjacent frames once or several times? For example 1 sample video with duration of 46 frames, do you random slice 16 adjacent frames once or more before feeding to the 3D CNN model? And these numbers in "Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])", Could you tell me where you get these? Thank you very much
Once per epoch but several times in the total training phase.
Normalize parameters could also be N~(0,1), but I take the RGB channel params from PyTorch official, which is counted on imagenet. Surely, it's better to calculate them on specific dataset.
Thank you very much sir!
First of all, I would like to thank you for your contribution to the community with your great work. I'm really interested in your study and I am trying to implement your code with the same experimental setup with Hockey dataset like in your paper. However there are something ambiguous for me here: