SamsungLabs / zero-cost-nas

Zero-Cost Proxies for Lightweight NAS
Apache License 2.0
140 stars 20 forks source link

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation #17

Closed zhengyuntao123 closed 1 year ago

zhengyuntao123 commented 1 year ago

Excuse me, when I run python nasbench1_pred.py --dataset cifar10 --start 0 --end 1000, such error occurs at idx=22 : RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

zhengyuntao123 commented 1 year ago
idx = 22
Traceback (most recent call last):
  File "E:\jupyter file\zero-cost-nas-main\nasbench1_pred.py", line 110, in <module>
    measures = predictive.find_measures(net,
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\predictive.py", line 118, in find_measures
    measures_arr = find_measures_arrays(net_orig, dataloader, dataload_info, device, loss_fn=loss_fn, measure_names=measure_names)
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\predictive.py", line 82, in find_measures_arrays
    raise e
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\predictive.py", line 67, in find_measures_arrays
    val = measures.calc_measure(measure_name, net_orig, device, inputs, targets, loss_fn=loss_fn, split_data=ds)
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\measures\__init__.py", line 47, in calc_measure
    return _measure_impls[name](net, device, *args, **kwargs)
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\measures\__init__.py", line 28, in measure_impl
    ret = func(net, *args, **kwargs, **impl_args)
  File "E:\jupyter file\zero-cost-nas-main\foresight\pruners\measures\synflow.py", line 54, in compute_synflow_per_weight
    torch.sum(output).backward(retain_graph=True)
  File "E:\anaconda\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\anaconda\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.DoubleTensor [1, 512, 8, 8]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its
 gradient, with torch.autograd.set_detect_anomaly(True).
thompsondd commented 1 year ago

I have the same problem. Have you tackled it?

vaenyr commented 1 year ago

Not sure why you see this problem while we did not when we originally run our experiments (could be a newer pytorch version?). However, the simple solution to problems like those is to avoid using in-place operations. Please check if you still experience problems when use change this line: https://github.com/SamsungLabs/zero-cost-nas/blob/main/foresight/models/nasbench1.py#L169 to avoid using +=, and maybe for nb201 also try changing https://github.com/SamsungLabs/zero-cost-nas/blob/main/foresight/models/nasbench2_ops.py#L107 and line 109 to inplace=False.

thompsondd commented 1 year ago

It works for me. Thank you for getting back to me.