Hi,
I'm trying to run the repo code with the Amazon and Yelp datasets before trying it on some of my own. I am running into the following error with the Amazon dataset and the baseline model. (I set torch.autograd.set_detect_anomaly(True) beforehand.)
File "/h/vkpriya/CP-VAE/run_baseline.py", line 91, in <module>
main(args)
File "/h/vkpriya/CP-VAE/run_baseline.py", line 63, in main
valid_loss = model.fit()
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/aggressive_vae.py", line 189, in fit
self.train(epoch)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/aggressive_vae.py", line 89, in train
logits, kl = self.vae.loss(batch_data_enc)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/vae.py", line 41, in loss
z, KL = self.encode(x, nsamples)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/vae.py", line 35, in encode
return self.encoder.encode(x, nsamples)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/base_network.py", line 72, in encode
mu, logvar = self.forward(inputs)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/base_network.py", line 158, in forward
mean, logvar = self.linear(hidden_repr).chunk(2, -1)
File "/h/vkpriya/condaenvs/pyt_cu/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/h/vkpriya/condaenvs/pyt_cu/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/h/vkpriya/condaenvs/pyt_cu/torch/nn/functional.py", line 1612, in linear
output = input.matmul(weight.t())
(print_stack at /pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Vocabulary size: 60229
Experiment dir: /h/vkpriya/CP-VAE/outputs/baseline/amazon-amazon/20201201-223426
Traceback (most recent call last):
File "/h/vkpriya/CP-VAE/run_baseline.py", line 91, in <module>
main(args)
File "/h/vkpriya/CP-VAE/run_baseline.py", line 63, in main
valid_loss = model.fit()
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/aggressive_vae.py", line 189, in fit
self.train(epoch)
File "/scratch/ssd001/home/vkpriya/CP-VAE/models/aggressive_vae.py", line 128, in train
loss.backward()
File "/h/vkpriya/condaenvs/pyt_cu/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/h/vkpriya/condaenvs/pyt_cu/torch/autograd/__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1024, 64]], which is output 0 of TBackward, is at version 32; expected version 31 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I am trying to debug it on my end, but any help would be appreciated!
(P.S: I get the same error with the baseline and CP-VAE models on my own datasets as well)
Hi, I'm trying to run the repo code with the Amazon and Yelp datasets before trying it on some of my own. I am running into the following error with the Amazon dataset and the baseline model. (I set
torch.autograd.set_detect_anomaly(True)
beforehand.)I am trying to debug it on my end, but any help would be appreciated! (P.S: I get the same error with the baseline and CP-VAE models on my own datasets as well)
Thank you!