MAC-AutoML / rethinking_performance_estimation_in_NAS

165 stars 22 forks source link

Epochs Question #16

Closed impulsecorp closed 3 years ago

impulsecorp commented 3 years ago

Your program works great, but I have a question about how exactly the epochs part of it works. I understand that your GitHub code does not include the random forest training, so when I run it like:

python augment.py --name=RS_BPE1 --file=random_darts_architecture.txt --data_path=data/ --save_path=experiment/ --batch_size=128 --lr=0.03 --layers=6 --init_channels=8 --epochs=600 --cutout_length=0 --image_size=16

I am using the results of your HPO search, and not actually training the random forest model to get those results. My question is, for Random Search, if it is just randomly training N models from the 100 architectures in the random_darts_architecture.txt file, using your pre-set hyperparameters, why does the accuracy steadily increase with each epoch that I train it for? What I mean is, if I run it for 600 epochs, what exactly is it doing in each epoch that increases the accuracy score? It can't be fine-tuning the random forest model, because that is not included in the repo.

zhengxiawu commented 3 years ago

Actually, the random forest is the HPO for an HPO method (NAS algorithm). "why does the accuracy steadily increase with each epoch that I train it for":random search will select the best architectures in the sampled architectures, therefore, it will steadily increase with each epoch. For the random forest model, we found that https://github.com/automl/fanova are doing exactly the same thing.Therefore, we strongly recommend users to use this code. We do not have much time to clear our random forest code, it's a mess.

impulsecorp commented 3 years ago

I am still confused about exactly what it is doing for each epoch. I am running Random Search with your suggested parameters: python augment.py --name=RS_BPE1 --file=random_darts_architecture.txt --data_path=data/ --save_path=experiment/ --batch_size=128 --lr=0.03 --layers=6 --init_channels=8 --epochs=600 --cutout_length=0 --image_size=16 From reading your paper, my understanding of the Random Search method was that once the best parameters are found (which are already giving to us ahead of time in your GitHub so we are skipping this part), it then randomly trains the 100 NNs in random_darts_architecture.txt, and the best scoring NN would be the winner. But based on the output, your program does not seem to be doing that.

Is each epoch fully training a different random NN from the random_darts_architecture.txt file? If so, the main thing I don't understand is why 100% of the time, the first of the 600 epochs always gets the worst score. If it is randomly picking an NN to train with each epoch, using your good default parameters, why would it not randomly sometimes start with a good scoring NN?

Here's what it shows when I run it: [ec2-user@ip-172-31-8-41 rethinking_performance_estimation_in_NAS]$ python augment.py --name=RS_BPE1 --file=random_darts_architecture.txt --data_path=data/ --save_path=experiment/ --batch_size=128 --lr=0.03 --layers=6 --init_channels=8 --epochs=100 --cutout_length=0 --image_size=28 Using multi genotypes from file Genotype(normal=[[('dil_conv_5x5', 0), ('skip_connect', 1)], [('dil_conv_5x5', 2), ('dil_conv_3x3', 0)], [('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 1), ('sep_conv_5x5', 0)]], normal_concat=range(2, 6), reduce=[[('avg_pool_3x3', 1), ('avg_pool_3x3', 0)], [('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 0), ('sep_conv_3x3', 3)], [('avg_pool_3x3', 2), ('sep_conv_5x5', 1)]], reduce_concat=range(2, 6))

11/26 01:54:54 PM |

11/26 01:54:54 PM | Parameters: Parameters: 11/26 01:54:54 PM | AUX_WEIGHT=0.4 AUX_WEIGHT=0.4 11/26 01:54:54 PM | BATCH_SIZE=128 BATCH_SIZE=128 11/26 01:54:54 PM | CUTOUT_LENGTH=0 CUTOUT_LENGTH=0 11/26 01:54:54 PM | DATA_LOADER_TYPE=Torch DATA_LOADER_TYPE=Torch 11/26 01:54:54 PM | DATA_PATH=data/ DATA_PATH=data/ 11/26 01:54:54 PM | DATASET=CIFAR10 DATASET=CIFAR10 11/26 01:54:54 PM | DROP_PATH_PROB=0.2 DROP_PATH_PROB=0.2 11/26 01:54:54 PM | EPOCHS=100 EPOCHS=100 11/26 01:54:54 PM | FILE=random_darts_architecture.txt FILE=random_darts_architecture.txt 11/26 01:54:54 PM | FP16=False FP16=False 11/26 01:54:54 PM | GENOTYPE=Genotype(normal=[[('dil_conv_5x5', 0), ('skip_connect', 1)], [('dil_conv_5x5', 2), ('dil_conv_3x3', 0)], [('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 1), ('sep_conv_5x5', 0)]], normal_concat=range(2, 6), reduce=[[('avg_pool_3x3', 1), ('avg_pool_3x3', 0)], [('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 0), ('sep_conv_3x3', 3)], [('avg_pool_3x3', 2), ('sep_conv_5x5', 1)]], reduce_concat=range(2, 6))

GENOTYPE=Genotype(normal=[[('dil_conv_5x5', 0), ('skip_connect', 1)], [('dil_conv_5x5', 2), ('dil_conv_3x3', 0)], [('sep_conv_5x5', 2), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 1), ('sep_conv_5x5', 0)]], normal_concat=range(2, 6), reduce=[[('avg_pool_3x3', 1), ('avg_pool_3x3', 0)], [('sep_conv_3x3', 0), ('sep_conv_5x5', 1)], [('sep_conv_3x3', 0), ('sep_conv_3x3', 3)], [('avg_pool_3x3', 2), ('sep_conv_5x5', 1)]], reduce_concat=range(2, 6))

11/26 01:54:54 PM | GPUS=[0] GPUS=[0] 11/26 01:54:54 PM | GRAD_CLIP=5.0 GRAD_CLIP=5.0 11/26 01:54:54 PM | I=0 I=0 11/26 01:54:54 PM | IMAGE_SIZE=28 IMAGE_SIZE=28 11/26 01:54:54 PM | INIT_CHANNELS=8 INIT_CHANNELS=8 11/26 01:54:54 PM | LAYERS=6 LAYERS=6 11/26 01:54:54 PM | LR=0.03 LR=0.03 11/26 01:54:54 PM | MOMENTUM=0.9 MOMENTUM=0.9 11/26 01:54:54 PM | NAME=RS_BPE1 NAME=RS_BPE1 11/26 01:54:54 PM | PATH=experiment/RS_BPE1/24 PATH=experiment/RS_BPE1/24 11/26 01:54:54 PM | PRINT_FREQ=200 PRINT_FREQ=200 11/26 01:54:54 PM | SAVE_DIR=experiment/ SAVE_DIR=experiment/ 11/26 01:54:54 PM | SAVE_PATH=experiment/ SAVE_PATH=experiment/ 11/26 01:54:54 PM | SEED=2 SEED=2 11/26 01:54:54 PM | WEIGHT_DECAY=0.0003 WEIGHT_DECAY=0.0003 11/26 01:54:54 PM | WORKERS=4 WORKERS=4 11/26 01:54:54 PM |

11/26 01:54:54 PM | Logger is set - training start Logger is set - training start 11/26 01:54:54 PM | Torch version is: 1.0.0 Torch version is: 1.0.0 11/26 01:54:54 PM | Torch_vision version is: 0.2.1 Torch_vision version is: 0.2.1 Files already downloaded and verified Files already downloaded and verified 11/26 01:55:40 PM | Model size = 0.066 MB Model size = 0.066 MB 11/26 01:55:42 PM | Create model with Full-float data Create model with Full-float data 11/26 01:55:42 PM | Epoch 0 LR 0.03 Epoch 0 LR 0.03 11/26 01:55:42 PM | Train: [ 1/100] Step 000/390 Loss 3.255 Prec@(1,5) (8.6%, 8.6%) Train: [ 1/100] Step 000/390 Loss 3.255 Prec@(1,5) (8.6%, 8.6%) 11/26 01:56:37 PM | Train: [ 1/100] Step 200/390 Loss 2.444 Prec@(1,5) (34.2%, 34.2%) Train: [ 1/100] Step 200/390 Loss 2.444 Prec@(1,5) (34.2%, 34.2%) 11/26 01:57:30 PM | Train: [ 1/100] Step 390/390 Loss 2.212 Prec@(1,5) (41.3%, 41.3%) Train: [ 1/100] Step 390/390 Loss 2.212 Prec@(1,5) (41.3%, 41.3%) 11/26 01:57:30 PM | train steps: 390 train steps: 390 11/26 01:57:30 PM | Train: [ 1/100] Final Prec@1 41.3000% Train: [ 1/100] Final Prec@1 41.3000% 11/26 01:57:30 PM | Valid: [ 1/100] Step 000/078 Loss 1.284 Prec@(1,5) (59.4%, 59.4%) Valid: [ 1/100] Step 000/078 Loss 1.284 Prec@(1,5) (59.4%, 59.4%) 11/26 01:57:33 PM | Valid: [ 1/100] Step 078/078 Loss 1.296 Prec@(1,5) (53.0%, 53.0%) Valid: [ 1/100] Step 078/078 Loss 1.296 Prec@(1,5) (53.0%, 53.0%) 11/26 01:57:33 PM | valid steps: 78 valid steps: 78 11/26 01:57:33 PM | Valid: [ 1/100] Final Prec@1 52.9900% Valid: [ 1/100] Final Prec@1 52.9900%

11/26 01:57:33 PM | Epoch 1 LR 0.029992598405485973 Epoch 1 LR 0.029992598405485973 11/26 01:57:33 PM | Train: [ 2/100] Step 000/390 Loss 1.946 Prec@(1,5) (53.1%, 53.1%) Train: [ 2/100] Step 000/390 Loss 1.946 Prec@(1,5) (53.1%, 53.1%) 11/26 01:58:30 PM | Train: [ 2/100] Step 200/390 Loss 1.701 Prec@(1,5) (56.0%, 56.0%) Train: [ 2/100] Step 200/390 Loss 1.701 Prec@(1,5) (56.0%, 56.0%) 11/26 01:59:23 PM | Train: [ 2/100] Step 390/390 Loss 1.616 Prec@(1,5) (58.2%, 58.2%) Train: [ 2/100] Step 390/390 Loss 1.616 Prec@(1,5) (58.2%, 58.2%) 11/26 01:59:23 PM | train steps: 390 train steps: 390 11/26 01:59:23 PM | Train: [ 2/100] Final Prec@1 58.1960% Train: [ 2/100] Final Prec@1 58.1960% 11/26 01:59:23 PM | Valid: [ 2/100] Step 000/078 Loss 1.092 Prec@(1,5) (56.2%, 56.2%) Valid: [ 2/100] Step 000/078 Loss 1.092 Prec@(1,5) (56.2%, 56.2%) 11/26 01:59:26 PM | Valid: [ 2/100] Step 078/078 Loss 1.102 Prec@(1,5) (60.8%, 60.8%) Valid: [ 2/100] Step 078/078 Loss 1.102 Prec@(1,5) (60.8%, 60.8%) 11/26 01:59:26 PM | valid steps: 78 valid steps: 78 11/26 01:59:26 PM | Valid: [ 2/100] Final Prec@1 60.8100% Valid: [ 2/100] Final Prec@1 60.8100%

11/26 01:59:26 PM | Epoch 2 LR 0.02997040092642407 Epoch 2 LR 0.02997040092642407 11/26 01:59:27 PM | Train: [ 3/100] Step 000/390 Loss 1.324 Prec@(1,5) (68.0%, 68.0%) Train: [ 3/100] Step 000/390 Loss 1.324 Prec@(1,5) (68.0%, 68.0%) 11/26 02:00:23 PM | Train: [ 3/100] Step 200/390 Loss 1.425 Prec@(1,5) (63.6%, 63.6%) Train: [ 3/100] Step 200/390 Loss 1.425 Prec@(1,5) (63.6%, 63.6%) 11/26 02:01:16 PM | Train: [ 3/100] Step 390/390 Loss 1.379 Prec@(1,5) (64.9%, 64.9%) Train: [ 3/100] Step 390/390 Loss 1.379 Prec@(1,5) (64.9%, 64.9%) 11/26 02:01:16 PM | train steps: 390 train steps: 390 11/26 02:01:16 PM | Train: [ 3/100] Final Prec@1 64.9000% Train: [ 3/100] Final Prec@1 64.9000% 11/26 02:01:16 PM | Valid: [ 3/100] Step 000/078 Loss 0.970 Prec@(1,5) (67.2%, 67.2%) Valid: [ 3/100] Step 000/078 Loss 0.970 Prec@(1,5) (67.2%, 67.2%) 11/26 02:01:19 PM | Valid: [ 3/100] Step 078/078 Loss 0.983 Prec@(1,5) (66.5%, 66.5%) Valid: [ 3/100] Step 078/078 Loss 0.983 Prec@(1,5) (66.5%, 66.5%) 11/26 02:01:19 PM | valid steps: 78 valid steps: 78 11/26 02:01:19 PM | Valid: [ 3/100] Final Prec@1 66.5200% Valid: [ 3/100] Final Prec@1 66.5200%

11/26 02:01:19 PM | Epoch 3 LR 0.0299334294690462 Epoch 3 LR 0.0299334294690462 11/26 02:01:20 PM | Train: [ 4/100] Step 000/390 Loss 1.094 Prec@(1,5) (73.4%, 73.4%) Train: [ 4/100] Step 000/390 Loss 1.094 Prec@(1,5) (73.4%, 73.4%) 11/26 02:02:16 PM | Train: [ 4/100] Step 200/390 Loss 1.251 Prec@(1,5) (68.4%, 68.4%) Train: [ 4/100] Step 200/390 Loss 1.251 Prec@(1,5) (68.4%, 68.4%) 11/26 02:03:09 PM | Train: [ 4/100] Step 390/390 Loss 1.227 Prec@(1,5) (69.0%, 69.0%) Train: [ 4/100] Step 390/390 Loss 1.227 Prec@(1,5) (69.0%, 69.0%) 11/26 02:03:09 PM | train steps: 390 train steps: 390 11/26 02:03:09 PM | Train: [ 4/100] Final Prec@1 69.0100% Train: [ 4/100] Final Prec@1 69.0100% 11/26 02:03:10 PM | Valid: [ 4/100] Step 000/078 Loss 1.108 Prec@(1,5) (64.8%, 64.8%) Valid: [ 4/100] Step 000/078 Loss 1.108 Prec@(1,5) (64.8%, 64.8%) 11/26 02:03:12 PM | Valid: [ 4/100] Step 078/078 Loss 1.173 Prec@(1,5) (62.2%, 62.2%) Valid: [ 4/100] Step 078/078 Loss 1.173 Prec@(1,5) (62.2%, 62.2%) 11/26 02:03:12 PM | valid steps: 78 valid steps: 78 11/26 02:03:12 PM | Valid: [ 4/100] Final Prec@1 62.1700% Valid: [ 4/100] Final Prec@1 62.1700%

11/26 02:03:13 PM | Epoch 4 LR 0.02988172051971717 Epoch 4 LR 0.02988172051971717 11/26 02:03:13 PM | Train: [ 5/100] Step 000/390 Loss 1.305 Prec@(1,5) (68.0%, 68.0%) Train: [ 5/100] Step 000/390 Loss 1.305 Prec@(1,5) (68.0%, 68.0%) 11/26 02:04:09 PM | Train: [ 5/100] Step 200/390 Loss 1.152 Prec@(1,5) (70.8%, 70.8%) Train: [ 5/100] Step 200/390 Loss 1.152 Prec@(1,5) (70.8%, 70.8%) 11/26 02:05:02 PM | Train: [ 5/100] Step 390/390 Loss 1.136 Prec@(1,5) (71.6%, 71.6%) Train: [ 5/100] Step 390/390 Loss 1.136 Prec@(1,5) (71.6%, 71.6%) 11/26 02:05:02 PM | train steps: 390 train steps: 390 11/26 02:05:02 PM | Train: [ 5/100] Final Prec@1 71.5900% Train: [ 5/100] Final Prec@1 71.5900% 11/26 02:05:03 PM | Valid: [ 5/100] Step 000/078 Loss 0.691 Prec@(1,5) (76.6%, 76.6%) Valid: [ 5/100] Step 000/078 Loss 0.691 Prec@(1,5) (76.6%, 76.6%) 11/26 02:05:05 PM | Valid: [ 5/100] Step 078/078 Loss 0.829 Prec@(1,5) (71.8%, 71.8%) Valid: [ 5/100] Step 078/078 Loss 0.829 Prec@(1,5) (71.8%, 71.8%) 11/26 02:05:06 PM | valid steps: 78 valid steps: 78 11/26 02:05:06 PM | Valid: [ 5/100] Final Prec@1 71.7900% Valid: [ 5/100] Final Prec@1 71.7900%

11/26 02:05:06 PM | Epoch 5 LR 0.029815325108927065 Epoch 5 LR 0.029815325108927065 11/26 02:05:06 PM | Train: [ 6/100] Step 000/390 Loss 1.427 Prec@(1,5) (68.0%, 68.0%) Train: [ 6/100] Step 000/390 Loss 1.427 Prec@(1,5) (68.0%, 68.0%) 11/26 02:06:02 PM | Train: [ 6/100] Step 200/390 Loss 1.080 Prec@(1,5) (73.1%, 73.1%) Train: [ 6/100] Step 200/390 Loss 1.080 Prec@(1,5) (73.1%, 73.1%) 11/26 02:06:55 PM | Train: [ 6/100] Step 390/390 Loss 1.061 Prec@(1,5) (73.5%, 73.5%) Train: [ 6/100] Step 390/390 Loss 1.061 Prec@(1,5) (73.5%, 73.5%) 11/26 02:06:55 PM | train steps: 390 train steps: 390 11/26 02:06:55 PM | Train: [ 6/100] Final Prec@1 73.4740% Train: [ 6/100] Final Prec@1 73.4740% 11/26 02:06:56 PM | Valid: [ 6/100] Step 000/078 Loss 0.648 Prec@(1,5) (78.9%, 78.9%) Valid: [ 6/100] Step 000/078 Loss 0.648 Prec@(1,5) (78.9%, 78.9%) 11/26 02:06:59 PM | Valid: [ 6/100] Step 078/078 Loss 0.823 Prec@(1,5) (72.5%, 72.5%) Valid: [ 6/100] Step 078/078 Loss 0.823 Prec@(1,5) (72.5%, 72.5%) 11/26 02:06:59 PM | valid steps: 78 valid steps: 78 11/26 02:06:59 PM | Valid: [ 6/100] Final Prec@1 72.5300% Valid: [ 6/100] Final Prec@1 72.5300%

11/26 02:06:59 PM | Epoch 6 LR 0.02973430876093033 Epoch 6 LR 0.02973430876093033 11/26 02:06:59 PM | Train: [ 7/100] Step 000/390 Loss 1.056 Prec@(1,5) (74.2%, 74.2%) Train: [ 7/100] Step 000/390 Loss 1.056 Prec@(1,5) (74.2%, 74.2%) 11/26 02:07:55 PM | Train: [ 7/100] Step 200/390 Loss 1.027 Prec@(1,5) (74.4%, 74.4%) Train: [ 7/100] Step 200/390 Loss 1.027 Prec@(1,5) (74.4%, 74.4%)

zhengxiawu commented 3 years ago

Sorry for the slow reply, in random search, we first generate the random architectures by python random_darts_generator.py --num=100. Then, each architecture is trained by using the searched BPE. For example python augment.py --name=RS_BPE1 --file=random_darts_architecture.txt --data_path=data/ --save_path=experiment/ --batch_size=128 --lr=0.03 --layers=6 --init_channels=8 --epochs=100 --cutout_length=0 --image_size=28 will create a folder named experiment/RS_BPE1/0 at the first running which save the training results of the searched architecture. If we run the command again, it will train the second architecture in random_darts_architecture.txt.