Currently the implementation will aggressively over-fit for the following parameters:
"python main.py --batch_size=256 --interval=1 --lr=0.1 --model="VGG('VGG16')" --aux_batch_size=50000 --grad_clip=0.45 --descending --epoch_skip=10"
I can't really get why. Maybe someone can take a look and see what simple oversight made this happen.
Currently the implementation will aggressively over-fit for the following parameters: "python main.py --batch_size=256 --interval=1 --lr=0.1 --model="VGG('VGG16')" --aux_batch_size=50000 --grad_clip=0.45 --descending --epoch_skip=10"
I can't really get why. Maybe someone can take a look and see what simple oversight made this happen.