NVIDIA / pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs
https://tcwang0509.github.io/pix2pixHD/
Other
6.64k stars 1.39k forks source link

RuntimeError: CUDA error: out of memory (Is there any problem with pytorch) #63

Open panduranga007 opened 6 years ago

panduranga007 commented 6 years ago

(TENSOR) C:\Users\Ravi\pix2pixHD>python train.py --name label2city_60p --batchSize 1 --label_nc 0 ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: ./datasets/cityscapes/ debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 512 gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 1024 load_features: False load_pretrain: lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: label2city_60p ndf: 64 nef: 16 netG: global ngf: 64 niter: 100 niter_decay: 100 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: scale_width save_epoch_freq: 10 save_latest_freq: 1000 serial_batches: False tf_log: False use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [AlignedDataset] was created

training images = 4000

GlobalGenerator( (model): Sequential( (0): ReflectionPad2d((3, 3, 3, 3)) (1): Conv2d(4, 64, kernel_size=(7, 7), stride=(1, 1)) (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (6): ReLU(inplace) (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (9): ReLU(inplace) (10): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (11): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (12): ReLU(inplace) (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (14): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (15): ReLU(inplace) (16): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (17): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (18): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (19): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (20): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (21): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (22): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (23): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (24): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (25): ConvTranspose2d(1024, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (26): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (27): ReLU(inplace) (28): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (29): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (30): ReLU(inplace) (31): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (32): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (33): ReLU(inplace) (34): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (35): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (36): ReLU(inplace) (37): ReflectionPad2d((3, 3, 3, 3)) (38): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1)) (39): Tanh() ) ) Traceback (most recent call last): File "train.py", line 38, in model = create_model(opt) File "C:\Users\Ravi\pix2pixHD\models\models.py", line 15, in create_model model.initialize(opt) File "C:\Users\Ravi\pix2pixHD\models\pix2pixHD_model.py", line 39, in initialize opt.n_blocks_local, opt.norm, gpu_ids=self.gpu_ids) File "C:\Users\Ravi\pix2pixHD\models\networks.py", line 44, in define_G netG.cuda(gpu_ids[0]) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 258, in cuda return self._apply(lambda t: t.cuda(device)) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 185, in _apply module._apply(fn) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 191, in _apply param.data = fn(param.data) File "C:\Users\Ravi\Anaconda3\envs\TENSOR\lib\site-packages\torch\nn\modules\module.py", line 258, in return self._apply(lambda t: t.cuda(device)) RuntimeError: CUDA error: out of memory

ZubairKhan001 commented 6 years ago

got same error

kleinyoni commented 6 years ago

How much memory do you have in your GPU? I think 12 should be OK.

AlexanderKozhevin commented 5 years ago

@kleinyoni Should we have 12Gb on a single card or I can have few video cards which gives 12Gb in total or more ?

kleinyoni commented 5 years ago

@AlexanderKozhevin alI think a single card. Since the primary data loading is done to one GPU(i think. not sure).

Try to use --ngf 32 instead.

It should use less memory.

Also you can try to run it on the cloud and see if additional memory help with the run.

vobject commented 5 years ago

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

sxudai commented 5 years ago

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

thanks for you. it does work.

8067222151 commented 2 years ago

The --resize_or_crop option has a greate effect on memory usage for me. Passing --resize_or_crop scale_width worked for the default testing images in the repo with a 11GiB GTX 1080Ti: python test.py --name label2city_1024p --netG local --ngf 32 --resize_or_crop scale_width

really worked , thank you