NVIDIA / semantic-segmentation

Nvidia Semantic Segmentation monorepo
BSD 3-Clause "New" or "Revised" License
1.76k stars 388 forks source link

ValueError: recompute_scale_factor is not meaningful with an explicit size #103

Closed SupriyaB1 closed 3 years ago

SupriyaB1 commented 3 years ago

Hi , I am trying to infer only one image , Data folder : /home/HRNet/semantic-segmentation/large_data/data/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/

Not able to run command with runx python3 -m runx.runx scripts/dump_folder.yml -i : error : /home/HRNet/bin/python: No module named torch.distributed

So I am running command : python3 -m torch.distributed.launch --nproc_per_node=1 /home/HRNet/semantic-segmentation/train.py --dataset cityscapes --cv 0 --bs_val 1 --n_scales "1.0,2.0" --eval folder --eval_folder '/home/HRNet/semantic-segmentation/large_data/data/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/test/' --snapshot "ASSETS_PATH/seg_weights/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth" --arch ocrnet.HRNet_Mscale --result_dir ./save

None Global Rank: 0 Local Rank: 0 Torch version: 1.7, 1.7.0+cu101 n scales [1.0, 2.0] dataset = cityscapes ignore_label = 255 num_classes = 19 Found 1 folder imgs cn num_classes 19 Using Cross Entropy Loss Using Cross Entropy Loss Loading weights from: checkpoint=/home/HRNet/semantic-segmentation/large_data/seg_weights/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth Warning: using Python fallback for SyncBatchNorm, possibly because apex was installed without --cuda_ext. The exception raised when attempting to import the cuda backend was: No module named 'syncbn' => init weights from normal distribution => loading pretrained model /home/HRNet/semantic-segmentation/large_data/seg_weights/hrnetv2_w48_imagenet_pretrained.pth Trunk: hrnetv2 Model params = 72.1M Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",) Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten. Traceback (most recent call last): File "/home/HRNet/semantic-segmentation/train.py", line 601, in main() File "/home/HRNet/semantic-segmentation/train.py", line 426, in main dump_all_images=True) File "/home/HRNet/semantic-segmentation/train.py", line 574, in validate args, val_idx) File "/home/HRNet/semantic-segmentation/utils/trnval_utils.py", line 142, in eval_minibatch output_dict = net(inputs) File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/.local/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward result = self.module(*inputs, *kwargs) File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/HRNet/semantic-segmentation/network/ocrnet.py", line 332, in forward return self.nscale_forward(inputs, cfg.MODEL.N_SCALES) File "/home/HRNet/semantic-segmentation/network/ocrnet.py", line 238, in nscale_forward pred = scale_as(pred, cls_out) File "/home/HRNet/semantic-segmentation/network/mynn.py", line 79, in scale_as align_corners=align_corners, recompute_scale_factor=True) File "/home/.local/lib/python3.6/site-packages/apex/amp/wrap.py", line 28, in wrapper return orig_fn(*new_args, **kwargs) File "/home/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 3110, in interpolate raise ValueError("recompute_scale_factor is not meaningful with an explicit size.") ValueError: recompute_scale_factor is not meaningful with an explicit size. Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/.local/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', '/home/HRNet/semantic-segmentation/train.py', '--local_rank=0', '--dataset', 'cityscapes', '--cv', '0', '--bs_val', '1', '--n_scales', '1.0,2.0', '--eval', 'folder', '--eval_folder', '/home/HRNet/semantic-segmentation/large_data/data/cityscapes/leftImg8bit_trainvaltest/leftImg8bit/test/', '--snapshot', 'ASSETS_PATH/seg_weights/cityscapes_ocrnet.HRNet_Mscale_outstanding-turtle.pth', '--arch', 'ocrnet.HRNet_Mscale', '--result_dir', './save']' returned non-zero exit status 1. Please help me to solve this error

I am using Ubuntu 18.04 pytorch 1.7 python 3.6 cuda 10.1

ajtao commented 3 years ago

Looks like you'll have to modify line 79 of mynn.py to remove the recompute_scale_factor=True arg.

liguoyu666 commented 3 years ago

Looks like you'll have to modify line 79 of mynn.py to remove the recompute_scale_factor=True arg.

How to modify ? I met the same problem, and changed 'recompute_scale_factor' to False.

ajtao commented 3 years ago

remove it

SupriyaB1 commented 3 years ago

Thanks ajtao, it worked.

SupriyaB1 commented 3 years ago

Is it possible to do inference only on sky and terrain class? if Yes, can you please let me know how to do it?

ajtao commented 3 years ago

Hi @SupriyaB1 I'd like to suggest that you try to get more familiar with the code. I bet that with just a few minutes of work you can figure it out.

SupriyaB1 commented 3 years ago

Thanks @ajtao , I am able to do it now.