Open jayxio opened 4 years ago
@jayxio Hi. Actually such error didn't occur in my experiments. This step was just for randomly choosing attention maps by using a random number package, and I believe the computation graph only need to cache these random index once so that backward calculation goes back smoothly on these attention maps. I think python 'random' package, numpy, or any other package is fine for this step.
Hi mate,
I'm trying to reproduce experiment results using WS-DAN/Xception and I'm impressed by the implementation of the WS-DAN network.
However, in train-wsdan.py, when I try to iterate dataloader,
for i, (X, y) in enumerate(data_loader):
, it callsbatch_loss.backward()
.It shows the following error:
** RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
So I print out parameters in "net":
which shows the so-called "[torch.cuda.FloatTensor [8]]" variables.
So I find how the attention weights are built at the very beginning:
So my question is, these parts use NumPy to calculate, so it seems what are we trying to build is actually two separate computation Graphs?
Or, should we just use pytorch to implement it? Because the gradient calculation error seems caused by this.
Thx for answering in advance!