ajbrock / SMASH

An experimental technique for efficiently exploring neural architectures.
489 stars 57 forks source link

Output format #3

Open bkj opened 7 years ago

bkj commented 7 years ago

I ran the code, and just want to be clear that I'm understanding the output format.

$ python train.py --which-dataset C10
$ python evaluate.py --SMASH=SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
$ python train.py --SMASH SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs --which-dataset C10
$ tail -n 4 logs/SMASH_Main_SMASH_D12_K4_N8_Nmax64_maxbneck2_SMASH_C10_seed0_100epochs_Rank0_C10_seed0_100epochs_log.jsonl
{"epoch": 98, "train_loss": 0.001096960324814265, "_stamp": 1504033169.525098, "train_err": 0.015555555555555555}
{"epoch": 98, "val_loss": 0.2705254354281351, "_stamp": 1504033174.813815, "val_err": 5.84}
{"epoch": 99, "train_loss": 0.0011473518449727433, "_stamp": 1504033324.084391, "train_err": 0.011111111111111112}
{"epoch": 99, "val_loss": 0.2725760878948495, "_stamp": 1504033329.318958, "val_err": 5.8}

I figure the 5.8 in the last line indicate that I've wound up w/ a trained model that gets 5.8% error on CIFAR-10 -- is that right? Which number does the 5.8 correspond to in Table 1 in the paper -- SmashV1=5.53 or SmashV2=4.03 or something else? I'm in the process of working through the code, but to double check that I understand the inputs/outputs.

Thanks Ben

ajbrock commented 7 years ago

The 5.8 there looks like error on the validation split (so training on 45,000 images, testing on 5,000 from the train set). If you want to use the CIFAR-10 test set, use the validate-test command line arg. You'll also see in stdout messages which indicate which split is being used, how many params the model has, and a couple other debuggy details.

All the nets in here are SMASHv2, and they have options for lots of variability in how the archs are defined (variable op structure, variable filter sizes, where BN goes, etc.) although at present the defaults do not correspond to the numbers in the paper (which a quick param count comparison should make clear). I'll be uploading the pre-trained models from the paper soon.

Aside, I'm currently working on writing up the documentation (been traveling the last two weeks), which should hopefully help with making this whole shebang more grokkable. Feel free to ask more questions--this code is awfully complicated.