Closed memoryjing closed 2 years ago
Hi Jingjing, Thank you for your interest. Yes, I set maxlen to 200 for both datasets. It was an empirical decision. I performed multiple experiments for maxlen={1000,750,500,200,100,75...} on THUMOS14, and I found that 200 produces a better performance. I think smaller (reasonably small) maxlen helps reducing overfitting in the training process.
Ok, got it. Thank your very much for your quick reply.
Dear Kyle, How you get T=200 segments from the whole video? And how you perform localization when testing? You say in the paper that you follow W-TALC. I found in W-TALC that they select T continuous snippets for training and threshold on the final T-CAM for localization. Due to the snippet length is less than the frame rate of the videos, they upsample T-CAM to meet the original frame rate. I am not sure how W-TALC and your A2CL-PT select T snippets when testing. Because if we also use T continuous snippets for testing, it might be not correct by directly upsampling the T-CAM to original frame rate.
Hi again,
I think you misunderstood the inference procedure of our method and W-TALC.
1) During the training process, input snippets are randomly sampled. Please refer to this line. 2) Our method and W-TALC do not upsample T-CAM. Please refer to this line for details of the testing scheme. 3) We do not use sampled snippets during the inference procedure. Please refer to this line.
Dear Kyle, Thanks for your excellent work. It helps me a lot. I have a question about the maxlen in your code. Do you set maxlen to 200 for both dataset (THUMOS14 and ActivityNet)? I found the video length in THUMOS14 vary greatly, the length of some videos are even more than 26 minutes (2437 snippets). I am not sure how you set the maxlen for THUMOS14.