Closed syomantak closed 4 years ago
Hi,
In line 35, Change
raw_w = extract_image_patches(b, ksizes=[kernel, kernel], strides=[self.rate * self.stride, self.rate * self.stride], rates=[1, 1], padding='same')
to
raw_w = extract_image_patches(b, ksizes=[self.ksize, self.ksize], strides=[self.rate * self.stride, self.rate * self.stride], rates=[1, 1], padding='same')
Also, uncomment line 168 to match the padding.
Thanks for the typos!!
Hello, Thanks for resolving that error, but now I am getting a different error!
Here
torch.Size([8, 256])
2020-06-30 08:33:32,022 ERROR size mismatch, m1: [8 x 256], m2: [16384 x 1] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
Traceback (most recent call last):
File "train.py", line 173, in <module>
main()
File "train.py", line 169, in main
raise e
File "train.py", line 108, in main
losses, coarse_result, inpainted_result = trainer(x, mask, ground_truth)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/scripts/trainer.py", line 44, in forward
refine_real, refine_fake = self.dis_forward(self.globalD, ground_truth, x2_inpaint.detach())
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/scripts/trainer.py", line 68, in dis_forward
batch_output = netD(batch_data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/model/network.py", line 225, in forward
x = self.linear(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [8 x 256], m2: [16384 x 1] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
The top 2 lines are results of print statement I inserted in the network.py
file.
Please let me know if you know the fix to this
Hi,
This is because the spatial size is not correct in the discriminator's linear layer. Change in line 215 in Network.py,
self.linear = nn.Linear(self.cnum * 4 * 8 * 8, 1)
to
self.linear = nn.Linear(self.cnum * 4 * 1 * 1, 1)
and you should be good to go!
Hey, I tried what you suggested but that throws a different unrelated error it seems 😅
ERROR one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Traceback (most recent call last):
File "train.py", line 173, in <module>
main()
File "train.py", line 169, in main
raise e
File "train.py", line 124, in main
losses['g'].backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Hi, can I know your environment settings? In my environment (Pytorch 1.4.0, Cuda 10), it works. Let me try to reproduce the error in your environment. Or you can change all inplace operations of the activation functions (ReLU/ELU) to False. Specifically, in line 302 - 312 in Network.py. Cheers!
Hi, It was a problem with colab (pytorch 1.5, cuda 10.1). I reset my runtime and a different error popped up.
ERROR shape '[4, 128, 7, 7]' is invalid for input of size 18432
Traceback (most recent call last):
File "train.py", line 173, in <module>
main()
File "train.py", line 169, in main
raise e
File "train.py", line 108, in main
losses, coarse_result, inpainted_result = trainer(x, mask, ground_truth)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/scripts/trainer.py", line 39, in forward
x1, x2 = self.netG(x, masks)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/model/network.py", line 22, in forward
x_stage2 = self.fine_generator(x, x_stage1, mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/model/network.py", line 187, in forward
x = self.contextul_attention(x, x, mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/content/Global-and-Local-Attention-Based-Free-Form-Image-Inpainting/model/Attention.py", line 167, in forward
y.contiguous().view(raw_int_fs)
RuntimeError: shape '[4, 128, 7, 7]' is invalid for input of size 18432
Let me share the link to the colab notebook I have added the edits you suggested in one of my repo and I am directly copying it there so it can run smoothly for anyone else. Let me know if you find anything!
Let me reproduce the error. Edit#1 Please comment line 168. Let me know if you face any error. Line 168 is given for any padding changes required. In your case, I think no padding is required.
@SayedNadim I am getting the gradient error again with uncommenting line 168. The notebook I shared, I had forgotten to uncomment that line so I was getting a the padding error. I have updated my repo and now you will be able to see the gradient error.
Yes, I can observe the error in the colab. Can you please try this with PyTorch 1.4.0 and let me know?
Hey, turns out it was a version problem with Pytorch. I am finally able to get the model to train. Thanks a lot for your help!
No worries! I am closing this issue then. Cheers!
@SayedNadim Hey, just wanted to let you know that equation (5) on your paper has a typo, at least in version linked in the repo.
Hey, I am working on RGB channeled MNIST data. I am trying to see if I can get a good inpainting model for some secondary applications. I am getting the following error -
I am attaching a part of the config file as well
Do you know what's cauing the error? This same error seems to be occuring in other similar inpainting networks as well! I confirmed that the images are ok by reading the images in the same way as the getitem method of the dataset class and I confirmed that the tensor is of shape [3,28,28].
A couple of more unrelated points - The train.py file should be in the main directory right? Both of these give module not found error; when in main directory -
python scripts/train.py --config configs/config.yaml
and when i scripts directory -python train.py --config configs/config.yaml
.There is a typo in model/network.py . In
Conv2dBlock
, the default argument should bepad_type='zeros'
and notpad_type='zero'
. Similarly, change the if else statements. The error was caused byself.conv = nn.Conv2d
line below.padding_mode
needs stringzeros
, notzero