Open dbaranchuk opened 1 year ago
Thanks for your feedback. It's possible that the issue arose due to a change I made in the drop-path-rate
while reorganizing the code. I kindly request a few days to confirm this. Thanks for your patience.
I've also tried to set drop-path-rate=0.1
:
Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.309, data_time: 0.005, memory: 56916, decode.loss_ce: 0.1292, decode.acc_seg: 94.3397, aux.loss_ce: 0.0613, aux.acc_seg: 93.3438, loss: 0.1905
Loss seems to be reproduced, different acc values are caused with a bug fix in this mmseg PR - so I guess the results are reproduced except the final mIoU which is still 50.73.
I set drop-path-rate=0.1, and the results are as follows: The best val mIoU is 51.35. Another difference is layer_scale, In the previous implementation, layer_scale is set to False, but now it is set to True by default. I will confirm this.
fyi, I re-ran exactly the same experiment and got best val mIoU 51.21 instead of 50.73. So, maybe determinstic=False also can cause some noticeable difference.
Hey, thanks for your great work!
We are trying to reproduce the results for the Augreg-B setting and compare our logs with your logs. We prepare the same environment, use this config, run on 8xA100 and set the same seed.
Here are the rows from your and our logs:
Ours: Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.309, data_time: 0.005, memory: 56916, decode.loss_ce: 0.1827, decode.acc_seg: 92.3258, aux.loss_ce: 0.0897, aux.acc_seg: 90.9583, loss: 0.2724
Yours: Iter [160000/160000] lr: 1.250e-10, eta: 0:00:00, time: 0.310, data_time: 0.004, memory: 56778, decode.loss_ce: 0.1277, decode.acc_seg: 76.1168, aux.loss_ce: 0.0600, aux.acc_seg: 75.5203, loss: 0.1876
There is a drastic difference in loss and acc values. We also reproduce this for other seeds. Overall mIoU: Ours: 50.54, Yours: 51.67 - also looks like the results are out of the standard deviation range.
Do you have any idea what could cause this discrepancy? We compare the configs and observe that you've used some ViTAdapterDenseV2 model that has different drop_path_rate and some unknown arguments. Could it be the case?