NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Other
8.61k stars 1.2k forks source link

Pose to Body returns CUDA error #32

Open cuuupid opened 6 years ago

cuuupid commented 6 years ago

When trying to run Pose2Body using the following setup:

I get the following error about CUDA:

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "train.py", line 273, in <module>
    train()
  File "train.py", line 105, in train
    flow_ref, conf_ref = flowNet(real_B, real_B_prev)  # reference flows and confidences
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet.py", line 33, in forward
    flow, conf = self.compute_flow_and_conf(input_A, input_B)
  File "/home/ubuntu/vid2vid/models/flownet.py", line 50, in compute_flow_and_conf
    flow1 = self.flowNet(data1)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/models.py", line 126, in forward
    flownetc_flow2 = self.flownetc(x)[0]
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/networks/FlowNetC.py", line 86, in forward
    out_corr = self.corr(out_conv3a, out_conv3b) # False
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/networks/correlation_package/correlation.py", line 59, in forward
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallf.stride1, self.stride2, self.corr_multiply)(input1, input2)lel/data_parallel.py", line 121, in forward                                               ion.py", line 27, in forward
    return self.module(*inputs[0], **kwargs[0])                                           f.corr_multiply)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__                                                      6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet.py", line 33, in forward                      6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-    flow, conf = self.compute_flow_and_conf(input_A, input_B)
  File "/home/ubuntu/vid2vid/models/flownet.py", line 50, in compute_flow_and_conf        6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-    flow1 = self.flowNet(data1)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__                                                      /ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-g    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/models.py", line 126, in forward
    flownetc_flow2 = self.flownetc(x)[0]
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/networks/FlowNetC.py", line 86, in forward                                                                                     loadSize 128 --max_frames_per_gpu 6 --n_frames_total 6 --dataroot ./datasets/Fortnite/fortnite/ --    out_corr = self.corr(out_conv3a, out_conv3b) # False
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/networks/correlation_package/correlation.py", line 59, in forward
    result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
  File "/home/ubuntu/vid2vid/models/flownet2_pytorch/networks/correlation_package/correlation.py", line 27, in forward
    self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:79)
frame #0: <unknown function> + 0x140f8 (0x7fef4d7c40f8 in /home/ubuntu/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #1: <unknown function> + 0x1433e (0x7fef4d7c433e in /home/ubuntu/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x107e1 (0x7fef4d7c07e1 in /home/ubuntu/.local/lib/python3.6/site-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #10: THPFunction_do_forward(THPFunction*, _object*) + 0x2ad (0x7fef969c9f8d in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

I thought this was a GPU/CUDA issue, but I am using the AWS Deep Learning Image so everything is setup correctly, and also Torch seems to work fine with CUDA by the below in Python interactive mode:

>>> import torch
>>> torch.cuda.set_device(0)
>>> torch.cuda.get_device_capability(0)(3, 7)
>>> x = torch.cuda.FloatTensor(1)
>>> y = torch.cuda.FloatTensor(1)
>>> x + y
tensor([0.], device='cuda:0')

I tried removing label_nc 0, but this gave rise to a bigger error:

/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [384,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [385,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [386,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [387,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [388,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [389,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [390,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [391,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [392,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [393,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [394,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [395,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [396,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [397,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [398,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [399,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [403,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [404,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [405,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [407,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [408,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [409,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [410,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [411,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [412,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [413,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [414,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [20,0,0], thread: [415,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
Traceback (most recent call last):
  File "train.py", line 273, in <module>
    train()
  File "train.py", line 97, in train
    fake_B, fake_B_raw, flow, weight, real_A, real_Bp, fake_B_last = modelG(input_A, input_B, inst_A, fake_B_last)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/vid2vid/models/vid2vid_model_G.py", line 125, in forward
    fake_B, fake_B_raw, flow, weight = self.generate_frame_train(netG, real_A_all, fake_B_prev, start_gpu, is_first_frame)
  File "/home/ubuntu/vid2vid/models/vid2vid_model_G.py", line 170, in generate_frame_train    fake_B_feat, flow_feat, fake_B_fg_feat, use_raw_only)
  File "/home/ubuntu/vid2vid/models/networks.py", line 161, in forward
    downsample = self.model_down_seg(input) + self.model_down_img(img_prev)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CuDNN error: CUDNN_STATUS_INTERNAL_ERROR
terminate called after throwing an instance of 'at::Error'
  what():  CUDA error: invalid device pointer (CudaCachingDeleter at /opt/conda/conda-bld/pytorch_1532579245307/work/aten/src/THC/THCCachingAllocator.cpp:498)
frame #0: THStorage_free + 0x44 (0x7f931b353314 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: THTensor_free + 0x2f (0x7f931b3f2a1f in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: at::CUDAFloatTensor: :~CUDAFloatTensor() + 0x9 (0x7f92fabc72e9 in /home/ubuntu/an
aconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: torch::autograd:  :Variable:  :Impl:  :~Impl() + 0x291 (0x7f931cf45761 in /home/ubuntu
/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
gnu.so)frame #4: torch::autograd: :Variable: :Impl: :~Impl() + 0x9 (0x7f931cf458d9 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)frame #5: <unknown function> + 0x770cd9 (0x7f931cf5ecd9 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)frame #6: <unknown function> + 0x770d84 (0x7f931cf5ed84 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>frame #21: __libc_start_main + 0xf0 (0x7f93335f2830 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)

The following is the network info printed out by train.py:

------------ Options -------------
TTUR: False
batchSize: 1
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
dataroot: ./datasets/Dance/dancer1/
dataset_mode: temporal
debug: False
display_freq: 100
display_id: 0
display_winsize: 512
feat_num: 3
fg: False
fg_labels: [26]fineSize: 512
gan_mode: ls
gpu_ids: [0]
input_nc: 3
isTrain: Truelabel_feat: False
label_nc: 35
lambda_F: 10.0
lambda_T: 10.0
lambda_feat: 10.0loadSize: 256
load_features: False
load_pretrain:
lr: 0.0002
max_dataset_size: inf
max_frames_backpropagate: 1
max_frames_per_gpu: 6
model: vid2vid
nThreads: 2
n_blocks: 9n_blocks_local: 3
n_downsample_E: 3
n_downsample_G: 2
n_frames_D: 3
n_frames_G: 3
n_frames_total: 6
n_gpus_gen: 1
n_layers_D: 3
n_local_enhancers: 1
n_scales_spatial: 1
n_scales_temporal: 3
name: pose2body_256_g1
ndf: 64
nef: 32
netE: simple
netG: composite
ngf: 128
niter: 10
niter_decay: 10
niter_fix_global: 0
niter_step: 5
no_first_img: False
no_flip: False
no_ganFeat: False
no_html: False
no_vgg: False
norm: batch
num_D: 1
output_nc: 3
phase: train
pool_size: 1
print_freq: 100
resize_or_crop: scaleWidth
save_epoch_freq: 1
save_latest_freq: 1000
serial_batches: False
tf_log: False
use_instance: False
use_single_G: False
which_epoch: latest
-------------- End ----------------
CustomDatasetDataLoader
dataset [TemporalDataset] was created
#training videos = 1
vid2vid
---------- Networks initialized -------------
-----------------------------------------------
---------- Networks initialized -------------
-----------------------------------------------
create web directory ./checkpoints/pose2body_256_g1/web..

Using a Tesla K80 with 12GB, CUDA 9.0, CUDNN 7.0.5, Ubuntu 16.04

cuuupid commented 6 years ago

The default training script for single GPU scripts/train_256_g1.sh also gives the first error.

dustinfreeman commented 6 years ago

Alas, I don't recognize these errors.

WangzhiDai commented 5 years ago

I also met this problem when using AWS with K80 GPU. It turns out to be a problem when installing flownet. In the 3 setup.py, the K80 arch is not added, making the complied file not compatible with K80. Add or change a line into '-gencode', 'arch=compute_37,code=sm_37' in all 3 setup.py's then install flownet solved the problem.

Refer to https://github.com/NVIDIA/flownet2-pytorch/issues/33

yuanzhou15 commented 5 years ago

@WangzhiDai I'm currently trying to overcome this problem on Google Colab, which also has a K80 GPU. Did you just have to do that for installing flownet, but not for download_models_flownet2?

SinanGncgl commented 5 years ago

@WangzhiDai I'm currently trying to overcome this problem on Google Colab, which also has a K80 GPU. Did you just have to do that for installing flownet, but not for download_models_flownet2?

Did you manage to run code on Google Colab ?

kartikJ-9 commented 4 years ago

Did anyone try this in colab? I am getting a similar error.