Disable SCALE_LR in CvT

NaN's are produced in CvT model at large number of devices (>64 GPUs) due to the scaling LR. This PR disables the scaling LR by default for benchmark purposes.

Previously we reduced the overall LR to alleviate this issue https://github.com/facebookresearch/FAMBench/commit/58f68431660fd84d1b7fdcaa4b6925dc12fb3bd3 but large learning rate still becomes an issue when the world size is large.

facebookresearch / FAMBench

Disable SCALE_LR in CvT #110