Open NerminSalem opened 5 years ago
@NerminSalem The problem here is because the number of training images 1803460 is not evenly divisible by the batch size which is 32 by default. You can simply comment out and ignore the assert statement. Also, you may want to change the batch size (--batch_size) depending how much VRAM you got.
@bobqywei thanks alot for your reply. i commented it and it worked but i have another question regarding the gpu you used? i have a nividia 960 GEFORCE and it always say that i m out of memory even when i shirned the dataset to 32 images and 1 mask. excuse my question if it appeared silly but i m new in image inpainting stuff thanks
@NerminSalem No problem. I used a V100 16GB on Google cloud platform, which is why I was able to use a large batch_size of 32 (set as default value for batch_size option). I think 960 has only 2 or 4GB of VRAM so you will have to reduce the batch_size down by a lot (batch size is the number of training samples loaded and processed by your GPU at the same time, where all of the images and weights must be held in VRAM).
Eg) python train.py --batch_size 4
Thanks alot for your reply. I will update the batch_size. Yout help is appreiated
@bobqywei i have a question regarding training on Google cloud platform, can u tell me details on how to use it and the pricing methodology used. sorry i know my question is out of scope thanks alot for understanding
@NerminSalem Pretty simple actually, just use your existing Google account to set up and I think you can even get up to $300 worth of free credits for the first year. Then you can start up a VM instance with desired specs and select the default Pytorch image. (For GPU access you have to request a quota increase). Pricing for V100 is around $2 i think, although you can go even cheaper by selecting preemptible option.
That's pretty much the gist of it. The documentation that Google has is pretty thorough. In order to remote into the VM I installed gcloud for easy ssh.
Thanks alot
when training the model,Have you met such similiar question? File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/fl/place2/loss.py", line 95, in forward loss_dict["tv"] = total_variation_loss(composed_output, self.l1) LAMBDAS["tv"] File "/home/zhangxl004/place2/loss.py", line 50, in total_variation_loss loss = l1(image[:, :, :, :-1] - image[:, :, :, 1:]) + l1(image[:, :, :-1, :] - image[:, :, 1:, :]) File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) TypeError: forward() missing 1 required positional argument: 'target' I would very appreciate if you could help me,Thank you!
Having tested on two pytorch versions,1.0.0 and 1.0.1,My torch version is 1.0.0 but in vain
Hi,have you solved the problem?
TypeError: forward() missing 1 required positional argument: 'target'
Thanks for your time
when training the model,Have you met such similiar question? File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/fl/place2/loss.py", line 95, in forward loss_dict["tv"] = total_variation_loss(composed_output, self.l1) LAMBDAS["tv"] File "/home/zhangxl004/place2/loss.py", line 50, in total_variation_loss loss = l1(image[:, :, :, :-1] - image[:, :, :, 1:]) + l1(image[:, :, :-1, :] - image[:, :, 1:, :]) File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) TypeError: forward() missing 1 required positional argument: 'target' I would very appreciate if you could help me,Thank you! Hi,have you solved the problem? TypeError: forward() missing 1 required positional argument: 'target' Thanks for your time
@zhangbaijin Now that I have some time, I'll probably try and update the codebase for latest torch version next week.
In the meantime, you can fix the above by changing that line to:
loss = l1(image[:, :, :, :-1], image[:, :, :, 1:]) + l1(image[:, :, :-1, :], image[:, :, 1:, :])
since nn.L1loss takes two arguments now (input, target)
thanks for sharing your work, i downloaded the masks from issue#1 and updated them to size 256256 and i want to train with places2 dataset (images 256256) but when i m trying to train using python3 train.py i m getting the following error Loaded training dataset with 1803460 samples and 12000 masks Traceback (most recent call last): File "train.py", line 73, in
assert(data_size % args.batch_size == 0)
AssertionError
can u help? @bobqywei