VITA-Group / AutoSpeech

[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang
https://arxiv.org/abs/2005.03215
MIT License
208 stars 42 forks source link

training search stage is very slow. #8

Closed shanguanma closed 3 years ago

shanguanma commented 3 years ago

When I read your paper and follow your code.While I run the command CUDA_VISIBLE_DEVICES=0 python search.py --cfg exps/search.yaml, I run it about 8 day on single Quadro RTX 8000(it has 45GB CUDA memory). But I only run 4 epochs, I don't know Why(because your paper say:search stage is about 5 days on single NVIDIA TITAN RTX GPU).

I will summary some running logging:

2021-01-09 13:48:30,025 Namespace(cfg='exps/search.yaml', load_path=None, opts=[], path_helper={'prefix': 'logs_search/search_2021_01_09_13_48_29', 'ckpt_path': 'logs_search/search_2021_01_09_13_48_29/Model', 'log_path': 'logs_search/search_2021_01_09_13_48_29/Log', 'sample_path': 'logs_search/search_2021_01_09_13_48_29/Samples'})
2021-01-09 13:48:30,026 CUDNN:
  BENCHMARK: True
  DETERMINISTIC: False
  ENABLED: True
DATASET:
  DATA_DIR: data/VoxCeleb1
  NUM_WORKERS: 0
  PARTIAL_N_FRAMES: 300
  SUB_DIR: merged
  TEST_DATASET: 
  TEST_DATA_DIR: 
MODEL:
  DROP_PATH_PROB: 0.2
  INIT_CHANNELS: 64
  LAYERS: 8
  NAME: model_search
  NUM_CLASSES: 1251
  PRETRAINED: False
PRINT_FREQ: 200
SEED: 3
TRAIN:
  ARCH_BETA1: 0.9
  ARCH_BETA2: 0.999
  ARCH_LR: 0.001
  ARCH_WD: 0.001
  BATCH_SIZE: 2
  BEGIN_EPOCH: 0
  BETA1: 0.9
  BETA2: 0.999
  DROPPATH_PROB: 0.2
  END_EPOCH: 50
  LR: 0.01
  LR_MIN: 0.001
  WD: 0.0003
VAL_FREQ: 5
2021-01-09 13:48:32,472 genotype = Genotype(normal=[('sep_conv_5x5', 1), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 2), ('skip_connect', 0), ('avg_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 3)], normal_concat=range(2, 6), reduce=[('max_pool_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 2), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 4)], reduce_concat=range(2, 6))
2021-01-09 13:48:47,236 Epoch: [0][    0/69180] Time 14.752 (14.752)    Data  0.011 ( 0.011)    Loss 7.0746e+00 (7.0746e+00)    Acc@1   0.00 (  0.00)   Acc@5   0.00 (  0.00)   Entropy 2.0794e+00 (2.0794e+00)
2021-01-09 14:03:16,840 Epoch: [0][  200/69180] Time  4.416 ( 4.400)    Data  0.004 ( 0.005)    Loss 2.0358e+01 (1.0089e+01)    Acc@1   0.00 (  0.25)   Acc@5   0.00 (  0.50)   Entropy 2.0788e+00 (2.0792e+00)
2021-01-09 14:15:37,689 Epoch: [0][  400/69180] Time  1.702 ( 4.053)    Data  0.004 ( 0.005)    Loss 1.6950e+01 (1.1480e+01)    Acc@1   0.00 (  0.12)   Acc@5   0.00 (  0.50)   Entropy 2.0772e+00 (2.0786e+00)
2021-01-09 14:21:15,859 Epoch: [0][  600/69180] Time  1.688 ( 3.267)    Data  0.004 ( 0.005)    Loss 1.6810e+01 (1.2325e+01)    Acc@1   0.00 (  0.08)   Acc@5   0.00 (  0.33)   Entropy 2.0757e+00 (2.0779e+00)
2021-01-09 14:26:53,783 Epoch: [0][  800/69180] Time  1.687 ( 2.873)    Data  0.004 ( 0.005)    Loss 1.0024e+01 (1.2812e+01)    Acc@1   0.00 (  0.06)   Acc@5   0.00 (  0.31)   Entropy 2.0742e+00 (2.0772e+00)
.........

2021-01-17 15:30:43,634 Epoch: [4][43800/69180] Time  3.355 ( 2.443)    Data  0.083 ( 0.073)    Loss 6.8538e+00 (6.9083e+00)    Acc@1   0.00 (  0.91)   Acc@5   0.00 (  3.33)   Entropy 1.7882e+00 (1.8038e+00)
2021-01-17 15:40:39,570 Epoch: [4][44000/69180] Time  3.173 ( 2.445)    Data  0.082 ( 0.073)    Loss 6.8132e+00 (6.9083e+00)    Acc@1   0.00 (  0.91)   Acc@5   0.00 (  3.32)   Entropy 1.7889e+00 (1.8037e+00)
2021-01-17 15:51:22,701 Epoch: [4][44200/69180] Time  3.590 ( 2.449)    Data  0.441 ( 0.073)    Loss 5.9202e+00 (6.9080e+00)    Acc@1  50.00 (  0.92)   Acc@5  50.00 (  3.33)   Entropy 1.7913e+00 (1.8037e+00)
2021-01-17 16:02:04,050 Epoch: [4][44400/69180] Time  3.137 ( 2.452)    Data  0.067 ( 0.073)    Loss 6.2018e+00 (6.9080e+00)    Acc@1   0.00 (  0.92)   Acc@5   0.00 (  3.34)   Entropy 1.7874e+00 (1.8036e+00)
2021-01-17 16:12:47,046 Epoch: [4][44600/69180] Time  3.181 ( 2.455)    Data  0.081 ( 0.073)    Loss 7.0442e+00 (6.9075e+00)    Acc@1   0.00 (  0.93)   Acc@5   0.00 (  3.34)   Entropy 1.7865e+00 (1.8035e+00)
shaojinding commented 3 years ago

Yes. It is very slow, due to the nature of Darts algorithm. Please see https://arxiv.org/abs/1806.09055