Open kashyap92 opened 4 years ago
hello, I get the same error when training FlowNet2C. How did you solve this problem?
Hi, With FlowNet2C you should use the MultiScale norm (--loss=MultiScale --loss_norm=L1). Using directly the L1 is for the FlowNet2 network.
What command should we use for FlowNet2S?
I ran this command and got the following error:
Command:
!CUDA_AVAILABLE_DEVICES=0 python main.py --total_epochs 100 --batch_size 8 --model FlowNet2S --loss=MultiScale \
--loss_norm=L1 --optimizer=Adam --optimizer_lr=1e-4 --skip_validation --crop_size 128 128 \
--training_dataset MpiSintelFinal \
--training_dataset_root /content/drive/MyDrive/fyp/MPI-Sintel-complete/training \
--resume /content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar \
--save /content/drive/MyDrive/checkpoints
Error:
Parsing Arguments
[0.033s] batch_size: 8
[0.033s] crop_size: [128, 128]
[0.033s] fp16: False
[0.033s] fp16_scale: 1024.0
[0.033s] gradient_clip: None
[0.033s] inference: False
[0.033s] inference_batch_size: 1
[0.033s] inference_dataset: MpiSintelClean
[0.033s] inference_dataset_replicates: 1
[0.033s] inference_dataset_root: ./MPI-Sintel/flow/training
[0.033s] inference_n_batches: -1
[0.033s] inference_size: [-1, -1]
[0.033s] inference_visualize: False
[0.033s] log_frequency: 1
[0.033s] loss: MultiScale
[0.033s] loss_l_weight: 0.32
[0.033s] loss_norm: L1
[0.033s] loss_numScales: 5
[0.033s] loss_startScale: 4
[0.033s] model: FlowNet2S
[0.033s] model_batchNorm: False
[0.033s] model_div_flow: 20
[0.033s] name: run
[0.033s] no_cuda: False
[0.033s] number_gpus: 1
[0.033s] number_workers: 8
[0.033s] optimizer: Adam
[0.033s] optimizer_amsgrad: False
[0.033s] optimizer_betas: (0.9, 0.999)
[0.033s] optimizer_eps: 1e-08
[0.033s] optimizer_lr: 0.0001
[0.033s] optimizer_weight_decay: 0
[0.033s] render_validation: False
[0.033s] resume: /content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar
[0.033s] rgb_max: 255.0
[0.033s] save: /content/drive/MyDrive/checkpoints
[0.033s] save_flow: False
[0.033s] schedule_lr_fraction: 10
[0.033s] schedule_lr_frequency: 0
[0.033s] seed: 1
[0.033s] skip_training: False
[0.033s] skip_validation: True
[0.033s] start_epoch: 1
[0.033s] total_epochs: 100
[0.033s] train_n_batches: -1
[0.034s] training_dataset: MpiSintelFinal
[0.034s] training_dataset_replicates: 1
[0.034s] training_dataset_root: /content/drive/MyDrive/fyp/MPI-Sintel-complete/training
[0.034s] validation_dataset: MpiSintelClean
[0.034s] validation_dataset_replicates: 1
[0.034s] validation_dataset_root: ./MPI-Sintel/flow/training
[0.034s] validation_frequency: 5
[0.034s] validation_n_batches: -1
[0.037s] Operation finished
Source Code
Current Git Hash: b'00cff7e3c07547ecdfa1b3314252963a36e705ec'
Initializing Datasets
[0.367s] Training Dataset: MpiSintelFinal
[0.413s] Training Input: [3, 2, 128, 128]
[0.454s] Training Targets: [2, 128, 128]
[0.454s] Operation finished
Building FlowNet2S model
[0.471s] Effective Batch Size: 8
[0.471s] Number of parameters: 38676506
[0.471s] Initializing CUDA
[4.738s] Parallelizing
[4.738s] Loading checkpoint '/content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar'
[4.863s] Loaded checkpoint '/content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar' (at epoch 0)
[4.863s] Initializing save directory: /content/drive/MyDrive/checkpoints
[4.866s] Operation finished
Initializing Adam Optimizer
[0.000s] amsgrad = False (<class 'bool'>)
[0.000s] weight_decay = 0 (<class 'int'>)
[0.000s] eps = 1e-08 (<class 'float'>)
[0.000s] betas = (0.9, 0.999) (<class 'tuple'>)
[0.000s] lr = 0.0001 (<class 'float'>)
[0.000s] Operation finished
Overall Progress: 0%| | 0/101 [00:00<?, ?it/s]
Training Epoch 0: 0%| | 0/130.0 [00:00<?, ?it/s]Traceback (most recent call last):
File "main.py", line 439, in <module>
train_loss, iterations = train(args=args, epoch=epoch, start_iteration=global_iteration, data_loader=train_loader, model=model_and_loss, optimizer=optimizer, logger=train_logger, offset=offset)
File "main.py", line 294, in train
loss_val.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.cuda.FloatTensor but got torch.FloatTensor
What command should we use for FlowNet2S?
I ran this command and got the following error:
Command:
!CUDA_AVAILABLE_DEVICES=0 python main.py --total_epochs 100 --batch_size 8 --model FlowNet2S --loss=MultiScale \ --loss_norm=L1 --optimizer=Adam --optimizer_lr=1e-4 --skip_validation --crop_size 128 128 \ --training_dataset MpiSintelFinal \ --training_dataset_root /content/drive/MyDrive/fyp/MPI-Sintel-complete/training \ --resume /content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar \ --save /content/drive/MyDrive/checkpoints
Error:
Parsing Arguments [0.033s] batch_size: 8 [0.033s] crop_size: [128, 128] [0.033s] fp16: False [0.033s] fp16_scale: 1024.0 [0.033s] gradient_clip: None [0.033s] inference: False [0.033s] inference_batch_size: 1 [0.033s] inference_dataset: MpiSintelClean [0.033s] inference_dataset_replicates: 1 [0.033s] inference_dataset_root: ./MPI-Sintel/flow/training [0.033s] inference_n_batches: -1 [0.033s] inference_size: [-1, -1] [0.033s] inference_visualize: False [0.033s] log_frequency: 1 [0.033s] loss: MultiScale [0.033s] loss_l_weight: 0.32 [0.033s] loss_norm: L1 [0.033s] loss_numScales: 5 [0.033s] loss_startScale: 4 [0.033s] model: FlowNet2S [0.033s] model_batchNorm: False [0.033s] model_div_flow: 20 [0.033s] name: run [0.033s] no_cuda: False [0.033s] number_gpus: 1 [0.033s] number_workers: 8 [0.033s] optimizer: Adam [0.033s] optimizer_amsgrad: False [0.033s] optimizer_betas: (0.9, 0.999) [0.033s] optimizer_eps: 1e-08 [0.033s] optimizer_lr: 0.0001 [0.033s] optimizer_weight_decay: 0 [0.033s] render_validation: False [0.033s] resume: /content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar [0.033s] rgb_max: 255.0 [0.033s] save: /content/drive/MyDrive/checkpoints [0.033s] save_flow: False [0.033s] schedule_lr_fraction: 10 [0.033s] schedule_lr_frequency: 0 [0.033s] seed: 1 [0.033s] skip_training: False [0.033s] skip_validation: True [0.033s] start_epoch: 1 [0.033s] total_epochs: 100 [0.033s] train_n_batches: -1 [0.034s] training_dataset: MpiSintelFinal [0.034s] training_dataset_replicates: 1 [0.034s] training_dataset_root: /content/drive/MyDrive/fyp/MPI-Sintel-complete/training [0.034s] validation_dataset: MpiSintelClean [0.034s] validation_dataset_replicates: 1 [0.034s] validation_dataset_root: ./MPI-Sintel/flow/training [0.034s] validation_frequency: 5 [0.034s] validation_n_batches: -1 [0.037s] Operation finished Source Code Current Git Hash: b'00cff7e3c07547ecdfa1b3314252963a36e705ec' Initializing Datasets [0.367s] Training Dataset: MpiSintelFinal [0.413s] Training Input: [3, 2, 128, 128] [0.454s] Training Targets: [2, 128, 128] [0.454s] Operation finished Building FlowNet2S model [0.471s] Effective Batch Size: 8 [0.471s] Number of parameters: 38676506 [0.471s] Initializing CUDA [4.738s] Parallelizing [4.738s] Loading checkpoint '/content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar' [4.863s] Loaded checkpoint '/content/drive/MyDrive/fyp/FlowNet2-S_checkpoint.pth.tar' (at epoch 0) [4.863s] Initializing save directory: /content/drive/MyDrive/checkpoints [4.866s] Operation finished Initializing Adam Optimizer [0.000s] amsgrad = False (<class 'bool'>) [0.000s] weight_decay = 0 (<class 'int'>) [0.000s] eps = 1e-08 (<class 'float'>) [0.000s] betas = (0.9, 0.999) (<class 'tuple'>) [0.000s] lr = 0.0001 (<class 'float'>) [0.000s] Operation finished Overall Progress: 0%| | 0/101 [00:00<?, ?it/s] Training Epoch 0: 0%| | 0/130.0 [00:00<?, ?it/s]Traceback (most recent call last): File "main.py", line 439, in <module> train_loss, iterations = train(args=args, epoch=epoch, start_iteration=global_iteration, data_loader=train_loader, model=model_and_loss, optimizer=optimizer, logger=train_logger, offset=offset) File "main.py", line 294, in train loss_val.backward() File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Function MulBackward0 returned an invalid gradient at index 1 - expected type torch.cuda.FloatTensor but got torch.FloatTensor
i met the same problem,have u fixed it out?expect your reply
Hello all, I am training on flownet2C with MPI sintel data and i am running into this issue: TypeError: rsub() received an invalid combination of arguments - got (Tensor, tuple), but expected one of:
Complete error is :
File "main.py", line 426, int]
train_loss, iterations = train(args=args, epoch=epoch, start_iteration=global_iteration, data_loader=train_loader, model=model_and_loss, optimizer=optimizer, logger
, offset=offset) 0%| | 0/130.0 [00:00<?, ?it/s]
File "main.py", line 267, in train
losses = model(data[0], target[0])
File "/datasets/Mahesh_kashyap/Opical_flow/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/datasets/Mahesh_kashyap/project/Opical_flow/env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], *kwargs[0])
File "/datasets/Mahesh_kashyap/project/Opical_flow/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, kwargs)
File "main.py", line 172, in forward
loss_values = self.loss(output, target)
File "/datasets/Mahesh_kashyap/Optical_flow/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/datasets/Mahesh_kashyap/Optical_flow/flownet2-pytorch/losses.py", line 37, in forward
lossvalue = self.loss(output, target)
File "/datasets/Mahesh_kashyap/Optical_flow/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, **kwargs)
File "/datasets/Mahesh_kashyap/Optical_flow/flownet2-pytorch/losses.py", line 18, in forward
lossvalue = torch.abs(output - target).mean()
File "/datasets/Mahesh_kashyap/Optical_flow/env/lib/python3.6/site-packages/torch/tensor.py", line 363, in rsub
return _C._VariableFunctions.rsub(self, other)
ypeError: rsub() received an invalid combination of arguments - got (Tensor, tuple), but expected one of:
CLI i am using is as follows: CUDA_VISIBLE_DEVICES=1 python main.py --batch_size 8 --model FlowNet2C --loss=L1Loss --optimizer=Adam --optimizer_lr=1e-10 \ --training_dataset MpiSintelFinal --training_dataset_root /datasets/Mahesh_kashyap/Optical_flow/flownet2-pytorch/MPI-Sintel/flow/training \ --validation_dataset MpiSintelClean --validation_dataset_root /datasets/Mahesh_kashyap/Optical_flow/flownet2-pytorch/MPI-Sintel/flow/training
Any help would be much appreciated.
Thank you.