training error - Githubissues

NerminSalem commented 5 years ago

thanks for sharing your work, i downloaded the masks from issue#1 and updated them to size 256256 and i want to train with places2 dataset (images 256256) but when i m trying to train using python3 train.py i m getting the following error Loaded training dataset with 1803460 samples and 12000 masks Traceback (most recent call last): File "train.py", line 73, in assert(data_size % args.batch_size == 0) AssertionError can u help? @bobqywei

bobqywei commented 5 years ago

@NerminSalem The problem here is because the number of training images 1803460 is not evenly divisible by the batch size which is 32 by default. You can simply comment out and ignore the assert statement. Also, you may want to change the batch size (--batch_size) depending how much VRAM you got.

NerminSalem commented 5 years ago

@bobqywei thanks alot for your reply. i commented it and it worked but i have another question regarding the gpu you used? i have a nividia 960 GEFORCE and it always say that i m out of memory even when i shirned the dataset to 32 images and 1 mask. excuse my question if it appeared silly but i m new in image inpainting stuff thanks

bobqywei commented 5 years ago

@NerminSalem No problem. I used a V100 16GB on Google cloud platform, which is why I was able to use a large batch_size of 32 (set as default value for batch_size option). I think 960 has only 2 or 4GB of VRAM so you will have to reduce the batch_size down by a lot (batch size is the number of training samples loaded and processed by your GPU at the same time, where all of the images and weights must be held in VRAM).

Eg) python train.py --batch_size 4

NerminSalem commented 5 years ago

Thanks alot for your reply. I will update the batch_size. Yout help is appreiated

NerminSalem commented 5 years ago

@bobqywei i have a question regarding training on Google cloud platform, can u tell me details on how to use it and the pricing methodology used. sorry i know my question is out of scope thanks alot for understanding

bobqywei commented 5 years ago

@NerminSalem Pretty simple actually, just use your existing Google account to set up and I think you can even get up to $300 worth of free credits for the first year. Then you can start up a VM instance with desired specs and select the default Pytorch image. (For GPU access you have to request a quota increase). Pricing for V100 is around $2 i think, although you can go even cheaper by selecting preemptible option.

That's pretty much the gist of it. The documentation that Google has is pretty thorough. In order to remote into the VM I installed gcloud for easy ssh.

NerminSalem commented 5 years ago

Thanks alot

erhuodaosi commented 5 years ago

when training the model,Have you met such similiar question? File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/fl/place2/loss.py", line 95, in forward loss_dict["tv"] = total_variation_loss(composed_output, self.l1) LAMBDAS["tv"] File "/home/zhangxl004/place2/loss.py", line 50, in total_variation_loss loss = l1(image[:, :, :, :-1] - image[:, :, :, 1:]) + l1(image[:, :, :-1, :] - image[:, :, 1:, :]) File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) TypeError: forward() missing 1 required positional argument: 'target' I would very appreciate if you could help me,Thank you!

erhuodaosi commented 5 years ago

Having tested on two pytorch versions,1.0.0 and 1.0.1,My torch version is 1.0.0 but in vain

zhangbaijin commented 4 years ago

Hi,have you solved the problem? TypeError: forward() missing 1 required positional argument: 'target' Thanks for your time

zhangbaijin commented 4 years ago

when training the model,Have you met such similiar question? File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/fl/place2/loss.py", line 95, in forward loss_dict["tv"] = total_variation_loss(composed_output, self.l1) LAMBDAS["tv"] File "/home/zhangxl004/place2/loss.py", line 50, in total_variation_loss loss = l1(image[:, :, :, :-1] - image[:, :, :, 1:]) + l1(image[:, :, :-1, :] - image[:, :, 1:, :]) File "/home/fl/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) TypeError: forward() missing 1 required positional argument: 'target' I would very appreciate if you could help me,Thank you! Hi,have you solved the problem? TypeError: forward() missing 1 required positional argument: 'target' Thanks for your time

bobqywei commented 4 years ago

@zhangbaijin Now that I have some time, I'll probably try and update the codebase for latest torch version next week.

In the meantime, you can fix the above by changing that line to: loss = l1(image[:, :, :, :-1], image[:, :, :, 1:]) + l1(image[:, :, :-1, :], image[:, :, 1:, :]) since nn.L1loss takes two arguments now (input, target)

bobqywei / inpainting-partial-conv

training error #5