dzh19990407 / LBDT

CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
MIT License
23 stars 3 forks source link

Training strategies on A2D datasets #2

Closed LJQbiu closed 2 years ago

LJQbiu commented 2 years ago

According to the hyperparameter of the paper: LDBT_4, Adam, batchsize =8, lr = e-4. II train on A2Dsentence using 2080TI. The highest accuracy on the test set is shown below. May I ask how the training can achieve the accuracy mentioned in the paper? Precision@0.5 0.641, Precision@0.6 0.571, Precision@0.7 0.484, Precision@0.8 0.325, Precision@0.9 0.077, mAP Precision @0.5:0.05:0.95 0.387, Overall IoU 0.635, Mean IoU (J) 0.5547, F 0.6425

dzh19990407 commented 2 years ago

Thank you for following our work! How many GPUs do you use? We train our model with 8 GPUs. Maybe the total batch size is the cause of this problem. Anyway, we will further validate this realeased codes and reply to you as soon as possible.

LJQbiu commented 2 years ago

Thanks for your quick reply. I only use 1 GPU! It may be due to optimization differences caused by batchsize.

dzh19990407 commented 2 years ago

We train our LBDT model again with the released codes with 8 GPUs and it can achieve similar accuracy mentioned in paper. We will upload the checkpoint can reproduce the accuracy in the paper.

LJQbiu commented 2 years ago

Got it! Thank you for your reply!