NVIDIA / vid2vid

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Other
8.61k stars 1.2k forks source link

train on my owndatasets #142

Open birdflyto opened 4 years ago

birdflyto commented 4 years ago

hello,i have finished the demo in the examples,when i trained on my own datasets which are made by the noted, i got the following errors ,and i have try several methods to solve it,but failed. I will be great appreciate it if you can give some guides.THANK YOU!!! CUDA_VISIBLE_DEVICES=1 python train.py --name label2city_256 --label_nc 1 --loadSize 256 --use_instance --fg --n_downsample_G 2 --num_D 1 --max_frames_per_gpu 6 --n_frames_total 6 ------------ Options ------------- TTUR: False add_face_disc: False basic_point_only: False batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False dataroot: datasets/Cityscapes/ dataset_mode: temporal debug: False densepose_only: False display_freq: 100 display_id: 0 display_winsize: 512 feat_num: 3 fg: True fg_labels: [26] fineSize: 512 fp16: False gan_mode: ls gpu_ids: [0] input_nc: 3 isTrain: True label_feat: False label_nc: 1 lambda_F: 10.0 lambda_T: 10.0 lambda_feat: 10.0 loadSize: 256 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf max_frames_backpropagate: 1 max_frames_per_gpu: 6 max_t_step: 1 model: vid2vid nThreads: 2 n_blocks: 9 n_blocks_local: 3 n_downsample_E: 3 n_downsample_G: 2 n_frames_D: 3 n_frames_G: 3 n_frames_total: 6 n_gpus_gen: 1 n_layers_D: 3 n_local_enhancers: 1 n_scales_spatial: 1 n_scales_temporal: 2 name: label2city_256 ndf: 64 nef: 32 netE: simple netG: composite ngf: 128 niter: 10 niter_decay: 10 niter_fix_global: 0 niter_step: 5 no_canny_edge: False no_dist_map: False no_first_img: False no_flip: False no_flow: False no_ganFeat: False no_html: False no_vgg: False norm: batch num_D: 1 openpose_only: False output_nc: 3 phase: train pool_size: 1 print_freq: 100 random_drop_prob: 0.05 random_scale_points: False remove_face_labels: False resize_or_crop: scaleWidth save_epoch_freq: 1 save_latest_freq: 1000 serial_batches: False sparse_D: False tf_log: False use_instance: True use_single_G: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [TemporalDataset] was created

training videos = 1

vid2vid ---------- Networks initialized -------------

---------- Networks initialized -------------

create web directory ./checkpoints/label2city_256/web...

(4,)

!!! (1, 8, 1, 128, 256) /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [448,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [449,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [450,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [451,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [452,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [453,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [454,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [455,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [456,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [457,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [458,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [459,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [460,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [461,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [462,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [463,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [464,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [465,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [466,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [467,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [468,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [469,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [470,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [471,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [472,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [473,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [474,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [475,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [476,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [477,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [478,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [108,0,0], thread: [479,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered label Traceback (most recent call last): File "train.py", line 150, in train() File "train.py", line 56, in train fake_B, fake_B_raw, flow, weight, real_A, real_Bp, fake_B_last = modelG(input_A, input_B, inst_A, fake_B_prev_last) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/public/home/lcc-dx07/perl5/vid2vid-master/models/models.py", line 38, in forward outputs = self.model(*inputs, *kwargs, dummy_bs=self.pad_bs) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward return self.module(*inputs[0], *kwargs[0]) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/public/home/lcc-dx07/perl5/vid2vid-master/models/vid2vid_model_G.py", line 125, in forward real_A_all, real_Ball, = self.encode_input(input_A, input_B, inst_A) File "/public/home/lcc-dx07/perl5/vid2vid-master/models/vid2vid_model_G.py", line 97, in encode_input print('label',input_label) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/tensor.py", line 66, in repr return torch._tensor_str._str(self) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 277, in _str tensor_str = _tensor_str(self, indent) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 195, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "/public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/_tensor_str.py", line 84, in init nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0)) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/generated/../THCReduceAll.cuh:317 terminate called after throwing an instance of 'c10::Error' what(): CUDA error: device-side assert triggered (insert_events at /opt/conda/conda-bld/pytorch_1544199946412/work/aten/src/THC/THCCachingAllocator.cpp:470) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f2e869e6cc5 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: + 0x135cb20 (0x7f2e8a4d9b20 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so) frame #2: at::TensorImpl::release_resources() + 0x50 (0x7f2e87041f90 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libcaffe2.so) frame #3: + 0x2ad98b (0x7f2e8132998b in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #4: torch::autograd::Variable::Impl::release_resources() + 0x17 (0x7f2e815a0127 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #5: + 0x121b2b (0x7f2ec717cb2b in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x31b8df (0x7f2ec73768df in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #7: + 0x31b921 (0x7f2ec7376921 in /public/home/lcc-dx07/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #24: __libc_start_main + 0xf5 (0x7f2ee0cefc05 in /lib64/libc.so.6) Aborted (core dumped)
coolchaits commented 3 years ago

Were you able to resolve this error? I am running into the same error while using a custom dataset