I am currently working on your code and curious about the difference between models training with BiDO (lambda_x=0 and lambda_y=0) and No. Def model.
Based on my understanding, it should be similar? In your code, I noticed that the difference comes from hyper-parameters (learning rate, learning rate scheduler, etc)?
Yes, you are right. For the No. Def model, I follow the same training protocol (optimizer, lr_scheduler, etc) as previous works. While in BiDO, I switch to Adam simply because I find it works better than SGD.
Hi,
Thank you for sharing amazing work.
I am currently working on your code and curious about the difference between models training with BiDO (lambda_x=0 and lambda_y=0) and No. Def model.
Based on my understanding, it should be similar? In your code, I noticed that the difference comes from hyper-parameters (learning rate, learning rate scheduler, etc)?
Thank you very much.