agrimgupta92 / sgan

Code for "Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks", Gupta et al, CVPR 2018
MIT License
819 stars 260 forks source link

invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). #22

Open zeyayin opened 5 years ago

zeyayin commented 5 years ago

Hello I meet an issue when I try to train new model on wins with python 0.4.1 .

with command !python scripts/train.py --noise_dim=0

I remove all the cuda() in order to run on cpu but I meet this issue:

Traceback (most recent call last): File "scripts/train.py", line 580, in main(args) File "scripts/train.py", line 245, in main optimizer_d) File "scripts/train.py", line 371, in discriminator_step generator_out = generator(obs_traj, obs_traj_rel, seq_start_end) File "C:\Users\zeya\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, *kwargs) File "C:\Users\zeya\sgan-master\scripts\sgan\models.py", line 508, in forward final_encoder_h = self.encoder(obs_traj_rel) File "C:\Users\zeya\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(input, **kwargs) File "C:\Users\zeya\sgan-master\scripts\sgan\models.py", line 63, in forward obs_traj_embedding = self.spatial_embedding(obs_traj.view(-1, 2)) RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at c:\new-builder_3\win-wheel\pytorch\aten\src\th\generic/THTensor.cpp:237

angeliand commented 5 years ago

Hi! Could you provide the script you have run, so that the issue can be recreated?

zeyayin commented 5 years ago

Hi! Could you provide the script you have run, so that the issue can be recreated?

Hi, I change my issue but not quite sure about your meaning of providing the script, Is the form of issue as what you expected now?

angeliand commented 5 years ago

This was exactly what I wanted, thank you! This is weird though, as it runs just fine for me. I would try different things if I were you:

zeyayin commented 5 years ago

After I change the model.py line 63 and run this command !python scripts/train.py --noise_dim= It just keep running or even got stuck without giving any response. I will try it on desktop PC later.

zeyayin commented 5 years ago

This was exactly what I wanted, thank you! This is weird though, as it runs just fine for me. I would try different things if I were you:

* you could try editing the code in [sgan\models.py line 63](https://github.com/agrimgupta92/sgan/blob/master/sgan/models.py#L63). to
  `obs_traj_embedding = self.spatial_embedding(obs_traj.contiguous().view(-1, 2))`
  or

* you could install the exact [requirements ](https://github.com/agrimgupta92/sgan/blob/master/requirements.txt), as this can cause problems too.

Hello after I add the contiguous() and run on destop, it begins to ask for updating of cuda driver version even I have delete all cuda(). Is there anyway I can run on CPU instead of gpu

xieshuaix commented 5 years ago

The issue arises from permuting mini-batch in seq_collate(data) of the data loader without calling contiguous() afterwards; tensor.permute breaks contiguity of a tensor and calling view on it raises an error. Training and testing on GPU are not subjected to this issue because tensor.cuda() automatically makes the tensor contiguous. Using reshape() instead of view() may fix this problem as well. These can be seen from below:

obs_traj.view(-1, 2).shape = {RuntimeError}invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/TH/generic/THTensor.cpp:213
obs_traj.numpy().strides = {tuple} <class 'tuple'>: (4, 64, 32)
obs_traj.contiguous().view(-1, 2).shape = {Size} torch.Size([1304, 2])
obs_traj.contiguous().numpy().strides = {tuple} <class 'tuple'>: (1304, 8, 4)
obs_traj.cuda().cpu().numpy().shape = {tuple} <class 'tuple'>: (8, 163, 2)
obs_traj.cuda().cpu().numpy().strides = {tuple} <class 'tuple'>: (1304, 8, 4)
obs_traj.reshape(-1, 2).numpy().shape = {tuple} <class 'tuple'>: (1304, 2)
obs_traj.reshape(-1, 2).numpy().strides = {tuple} <class 'tuple'>: (8, 4)
davidglavas commented 5 years ago

In models.py:

This resolved the issue for me.