Hon-Wong / PTSEFormer

[ECCV 2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection
https://arxiv.org/abs/2209.02242
MIT License
34 stars 7 forks source link

About single frame baseline training #3

Closed sfwang20 closed 1 year ago

sfwang20 commented 2 years ago

Thanks for your excellent work. Some questions about the single frame baseline training on VID datasets: How many epochs trained in single frame baseline (and what's the learning drop schedule) ? Did you use checkpoint pretrained on COCO datasets in single frame baseline? I've tried hard a long time to reproduce the single frame baseline mAP but I can't reproudce it (81.2%)

Thank you very much!

Angknpng commented 2 years ago

Thanks for your excellent work. Some questions about the single frame baseline training on VID datasets: How many epochs trained in single frame baseline (and what's the learning drop schedule) ? Did you use checkpoint pretrained on COCO datasets in single frame baseline? I've tried hard a long time to reproduce the single frame baseline mAP but I can't reproudce it (81.2%)

Thank you very much!

Hi! Excited to see that you are also following this work, I have questions to ask.

I would like to ask how much memory you use to train the model? I failed to train a model using two reference frames on 4 RTX3090 graphics cards, only when I use 1 frame of reference frames, the memory usage per graphics card is 17g.

In addition, when I first trained the model in this paper, there was a tensor dimension error, so I modified 'activation.py' to adjust the tensor dimension. I don't know if this is the reason for the abnormal graphics memory.

sincerely!

Hon-Wong commented 1 year ago

Thanks for your excellent work. Some questions about the single frame baseline training on VID datasets: How many epochs trained in single frame baseline (and what's the learning drop schedule) ? Did you use checkpoint pretrained on COCO datasets in single frame baseline? I've tried hard a long time to reproduce the single frame baseline mAP but I can't reproudce it (81.2%)

Thank you very much!

Hi,

Thanks for your attention! We use the Deformable DETR with 4 FPN layers as our baseline model. We train it for 50 epochs on ImageNet VID and DET, with a lr drop at the 40th epoch, the initial LR is set as 1e-4 (backbone 1e-5). It is not pretrained on COCO.

BW, Han Wang

Hon-Wong commented 1 year ago

Thanks for your excellent work. Some questions about the single frame baseline training on VID datasets: How many epochs trained in single frame baseline (and what's the learning drop schedule) ? Did you use checkpoint pretrained on COCO datasets in single frame baseline? I've tried hard a long time to reproduce the single frame baseline mAP but I can't reproudce it (81.2%) Thank you very much!

Hi! Excited to see that you are also following this work, I have questions to ask.

I would like to ask how much memory you use to train the model? I failed to train a model using two reference frames on 4 RTX3090 graphics cards, only when I use 1 frame of reference frames, the memory usage per graphics card is 17g.

In addition, when I first trained the model in this paper, there was a tensor dimension error, so I modified 'activation.py' to adjust the tensor dimension. I don't know if this is the reason for the abnormal graphics memory.

sincerely!

Hi,

Thanks for your attention! I use 8 x V100 (32G) to train the model. This work needs a large GPU memory.

BW, Han Wang