Introduces LR variable in run_cvt_train.sh and sets the default LR to be 0.000125.
NaNs are observed when running the default test case with a lower batch size (bs=128) as previously noted before in https://github.com/microsoft/CvT/issues/4#issuecomment-855193111. This PR introduces a variable to control the LR and reduces the learning rate to avoid NaN observations for the 16 GPU test case.
@mindest @gopitk
Thank you both for the comments, is this okay to merge? Please let me know if there's anything else to address, NaNs are no longer observed after the change.
Introduces LR variable in run_cvt_train.sh and sets the default LR to be 0.000125.
NaNs are observed when running the default test case with a lower batch size (bs=128) as previously noted before in https://github.com/microsoft/CvT/issues/4#issuecomment-855193111. This PR introduces a variable to control the LR and reduces the learning rate to avoid NaN observations for the 16 GPU test case.