ajbrock / FreezeOut

Accelerate Neural Net Training by Progressively Freezing Layers
211 stars 31 forks source link

fashionMNIST results #3

Open bkj opened 6 years ago

bkj commented 6 years ago

This code is linked from the fashion-mnist repo, w/ very good results. Do you have a script somewhere I might be able to use to reproduce those numbers?

Thanks Ben

ajbrock commented 6 years ago

Just drop a FashionMNIST data-loader in utils.py and replace the mean / average statistics. Run it with the options that turn FreezeOut off, something like: python train.py --t_0 1.0 --epochs 300 --which_dataset 10

and then whatever args you want for your model of choice. The scripts I used are on another machine I won't have access to for a few months, sorry.

Also, my results are not very good, it's just that no one has ever bothered to properly benchmark that dataset. With a tiny bit of hyperparameter tuning I wouldn't be surprised if you could exceed 98% accuracy, modulo how many mislabeled samples there are in the test set.

bkj commented 6 years ago

OK thanks. Do you have any pointers to a well-tuned model?

~ Ben

ajbrock commented 6 years ago

Probably try one of those shake-shake or shake-drop variants, with a well-tuned SGDR cycle and distillation a-la born again neural nets or stochastic weighted averaging, and figure out the right width/depth. Googling any of those terms should point you in the right direction if they're not familiar =)

bkj commented 6 years ago

Yeah I bet that would do it... but have you ever seen a model that uses all of those techniques? I'm not sure any datasets are properly benchmarked by your definition! :)

Related -- have you ever seen any code implementing born again networks? It's fairly simple, but details can be finnicky.

~ Ben

ajbrock commented 6 years ago

Hah, I'm not saying that datasets that haven't been hit with the latest-and-greatest aren't properly benchmarked, but no one's beaten a vanilla WRN40-4 with standard data aug. Kagglers would have a field day =p

As to BANN you can try asking Tommaso but i'm not familiar with any implementations.