Open flying-hou opened 3 years ago
Thanks for you attention to our work. We train our model with 8 NVIDIA Tesla V100 SXM2 32 GB. In practice, we prefer to use our pre-trained still image detector (which will takes 4.8 hours with ResNet50 ) as the pre-trained model. With different number of reference frames, the training/inference time are shown as follows.
number of reference images 2 4 8 14
Training time (hours) 4. 9 6.7 9.8 12.7
inference time (s/per image) 0.2320 0.2527 0.3447 0.6241
MAP(%) 77.7 78.3 79.0 79.9
Thank you for your prompt reply.
We would like to update our inference time as the inference time in the previous response includes loss computation, mAP computation, results writing and so on. When the number of reference images is 2,4,8 and 14, the inference time is 88 ms, 123ms, 213 ms, and 341 ms respectively.
Thanks for you attention to our work. We train our model with 8 NVIDIA Tesla V100 SXM2 32 GB. In practice, we prefer to use our pre-trained still image detector (which will takes 4.8 hours with ResNet50 ) as the pre-trained model. With different number of reference frames, the training/inference time are shown as follows.
- number of reference images 2 4 8 14
- Training time (hours) 4. 9 6.7 9.8 12.7
- inference time (s/per image) 0.2320 0.2527 0.3447 0.6241
- MAP(%) 77.7 78.3 79.0 79.9
Is the training time for training one epoch or 10 epochs? So how many epochs for training these?
After reading your paper, I was deeply inspired.Your work has led to the successful application of Transformer on VOD. However, there are three questions I want to ask: