czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.22k stars 136 forks source link

Reproducing Augreg-B results #83

Open dbaranchuk opened 1 year ago

dbaranchuk commented 1 year ago

Hey, thanks for your great work!

We are trying to reproduce the results for the Augreg-B setting and compare our logs with your logs. We prepare the same environment, use this config, run on 8xA100 and set the same seed.

Here are the rows from your and our logs:

Ours: Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.309, data_time: 0.005, memory: 56916, decode.loss_ce: 0.1827, decode.acc_seg: 92.3258, aux.loss_ce: 0.0897, aux.acc_seg: 90.9583, loss: 0.2724

Yours: Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.310, data_time: 0.004, memory: 56778, decode.loss_ce: 0.1277, decode.acc_seg: 76.1168, aux.loss_ce: 0.0600, aux.acc_seg: 75.5203, loss: 0.1876

There is a drastic difference in loss and acc values. We also reproduce this for other seeds. Overall mIoU: Ours: 50.54, Yours: 51.67 - also looks like the results are out of the standard deviation range.

Do you have any idea what could cause this discrepancy? We compare the configs and observe that you've used some ViTAdapterDenseV2 model that has different drop_path_rate and some unknown arguments. Could it be the case?

czczup commented 1 year ago

Thanks for your feedback. It's possible that the issue arose due to a change I made in the drop-path-rate while reorganizing the code. I kindly request a few days to confirm this. Thanks for your patience.

dbaranchuk commented 1 year ago

I've also tried to set drop-path-rate=0.1:

Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.309, data_time: 0.005, memory: 56916, decode.loss_ce: 0.1292, decode.acc_seg: 94.3397, aux.loss_ce: 0.0613, aux.acc_seg: 93.3438, loss: 0.1905

Loss seems to be reproduced, different acc values are caused with a bug fix in this mmseg PR - so I guess the results are reproduced except the final mIoU which is still 50.73.

czczup commented 1 year ago

I set drop-path-rate=0.1, and the results are as follows: image The best val mIoU is 51.35. Another difference is layer_scale, In the previous implementation, layer_scale is set to False, but now it is set to True by default. I will confirm this.

dbaranchuk commented 1 year ago

fyi, I re-ran exactly the same experiment and got best val mIoU 51.21 instead of 50.73. So, maybe determinstic=False also can cause some noticeable difference.