chenxin061 / pdarts

Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"
Other
359 stars 83 forks source link

[Bug]CUDA error: out of memory #10

Closed zihaozhang9 closed 5 years ago

zihaozhang9 commented 5 years ago

python 3.6.5 torch 0.4.1

I use: python -u train_search.py \ --tmp_data_dir /path/to/your/data \ --save log_path \ --add_layers 6 \ --add_layers 12 \ --dropout_rate 0.1 \ --dropout_rate 0.4 \ --dropout_rate 0.7 \ --note note_of_this_run |tee trainlog.log log: Experiment dir : log_pathsearch-note_of_this_run-20190615-024812 06/15 02:48:13 AM GPU device = 0 06/15 02:48:13 AM args = Namespace(add_layers=['0', '6', '12'], add_width=['0'], arch_learning_rate=0.0006, arch_weight_decay=0.001, batch_size=96, cifar100=False, cuto ut=False, cutout_length=16, drop_path_prob=0.3, dropout_rate=['0.1', '0.4', '0.7'], epochs=25, gpu=0, grad_clip=5, init_channels=16, layers=5, learning_rate=0.025, lear ning_rate_min=0.0, momentum=0.9, note='note_of_this_run', report_freq=50, save='log_pathsearch-note_of_this_run-20190615-024812', seed=2, tmp_data_dir='/path/to/your/da ta', train_portion=0.5, weight_decay=0.0003, workers=2) Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /path/to/your/data/cifar-10-python.tar.gz 06/15 02:49:29 AM param size = 1.275834MB 06/15 02:49:29 AM Epoch: 0 lr: 2.500000e-02 Traceback (most recent call last): File "train_search.py", line 468, in main() File "train_search.py", line 158, in main train_acc, train_obj = train(train_queue, valid_queue, model, network_params, criterion, optimizer, optimizer_a, lr, train_arch=False) File "train_search.py", line 295, in train logits = model(input) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/pdarts/model_search.py", line 139, in forward s0, s1 = s1, cell(s0, s1, weights) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/pdarts/model_search.py", line 70, in forward s = sum(self.cell_ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/pdarts/model_search.py", line 70, in s = sum(self.cell_ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/pdarts/model_search.py", line 33, in forward return sum(w op(x) for w, op in zip(weights, self.m_ops)) File "/pdarts/model_search.py", line 33, in return sum(w op(x) for w, op in zip(weights, self.m_ops)) RuntimeError: CUDA error: out of memory

zihaozhang9 commented 5 years ago

Looks like the GPU problem. Training has started now