Error in search_gumbel - Githubissues

JulietteMarrie commented 2 years ago

Hi,

I tried running the Gumbel-Softmax model with the following parameters:

GPU=0 DATASET=cifar10 MODEL=wresnet40_2 EPOCH=200 BATCH=128 LR=0.1 WD=0.0002 AWD=0.0 ALR=0.001 CUTOUT=16 SAVE=CIFAR10

python train_search_paper.py --unrolled --report_freq 1 --num_workers 0 --epoch ${EPOCH} --batch_size ${BATCH} --learning_rate ${LR} --dataset ${DATASET} --model_name ${MODEL} --save ${SAVE} --gpu ${GPU} --arch_weight_decay ${AWD} --arch_learning_rate ${ALR} --cutout --cutout_length ${CUTOUT}

and I am getting the following error:

Traceback (most recent call last): File "train_search_paper.py", line 284, in main() File "train_search_paper.py", line 175, in main train_acc, train_obj = train(train_queue, valid_queue, model, architect, criterion, optimizer, lr) File "train_search_paper.py", line 223, in train loss.backward() File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [105]] is $t version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Do you have any idea where it comes from?

Thanks,

Juliette

hyang428 commented 2 years ago

Hi, Juliette,

Any idea how to solve this issue? Had exactly the same problem in my code, most likely was caused by the two sampling function used in the sample() function: RelaxedBernoulli() and RelaxedOneHotCategorical(), by setting the self.probabilities.detach(), self.ops_weight.detach(), the code can be ran, but self.probalibities and self.ops_weight will not get updated, otherwise, had the RuntimeError.

Best

pengyao96 commented 2 years ago

I have the same problem and I find maybe the pytorch version problem. when I install pytorch1.2.0 following the author requirements, this problem can be solved.

VDIGPKU / DADA

Error in search_gumbel #27