NVIDIA / semantic-segmentation

Nvidia Semantic Segmentation monorepo
BSD 3-Clause "New" or "Revised" License
1.78k stars 388 forks source link

need your help #13

Closed sde123 closed 5 years ago

sde123 commented 5 years ago

Hi authors, thank you for sharing your great work! I meet a problem need your help when I run your code(./scripts/submit_cityscapes_WideResNet38.sh ./cityscapes_best.pth ./result):

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.91 GiB total capacity; 8.29 GiB already allocated; 1.06 GiB free; 877.60 MiB cached)

Could you please tell me how to solve it? looking for ward your help, thank you very much!

bryanyzhu commented 5 years ago

@karansapra Could you look into this? I tried on my end and found that we need at least 3 GPUs to run the evaluation code, otherwise it will cause OOM error.

@sde123 Can I ask how many GPU cards are you using to run the evaluation? And how much memory does each card have? Thank you.

sde123 commented 5 years ago

@bryanyzhu Thanks for your help. I have 2 GPUs( two GTX 1080Ti), about 10 G for each GPU, so i have about 20 G in GPU.

sde123 commented 5 years ago

@bryanyzhu Thanks for you help. Is there any way to solve this problem? Must we have at least 3 GPUs to run the evaluation code?

bryanyzhu commented 5 years ago

@sde123 Looking into the issue, will let you know asap.

sde123 commented 5 years ago

@bryanyzhu Thanks for your help, I do not understand why it cost so much GPU merrory. Two gtx1080Ti is still not enough

bryanyzhu commented 5 years ago

@sde123 The reason is when we do sliding based multi-scale evaluation with flipping, sometimes the input can be 6x3x2048x2048. This will cost a lot of GPU memory. I did a simple fix that I forward this data 6 times, and each time the input will just be 1x3x2048x2048, this works on my end with one V100 card (with 16GB memory).

I changed the lines here from

with torch.no_grad():
    input_crops = Variable(input_crops.cuda())
    output_scattered = model(input_crops)

to

torch.cuda.empty_cache()
with torch.no_grad():
    bi, _, hi, wi = input_crops.size()
    if hi >= args.crop_size:
        output_scattered_list = []
        for b_idx in range(bi):
            cur_input = input_crops[b_idx,:,:,:].unsqueeze(0).cuda()
            cur_output = model(cur_input)
            output_scattered_list.append(cur_output)
        output_scattered = torch.cat(output_scattered_list, dim=0)
    else:
        input_crops = input_crops.cuda()
        output_scattered = model(input_crops)

I haven't fully tested this change yet, but I think it should work. If you still experience OOM issue, please change --scales 0.5,1.0,2.0 to --scales 0.5,1.0.

I will continue to look into this issue as well as several efficiency issues during evaluation. I will update the repo very soon.

sde123 commented 5 years ago

@bryanyzhu Thank you for your help. I run this in terminal: ./scripts/submit_cityscapes_WideResNet38.sh ./cityscapes_best.pth ./result do I need to define multi-gpu in the command? I find nothing about multi-gpu in eval.py

bryanyzhu commented 5 years ago

No, you don't need to. But it is a good point, that is also one thing I want to add to the eval.py script.

karansapra commented 5 years ago

@bryanyzhu @sde123 maybe it make sense to put back fp16 inference? I can add it to feature list/

bryanyzhu commented 5 years ago

@karansapra Good point, it is nice to have fp16 inference. Thank you for adding it.

karansapra commented 5 years ago

:+1:

karansapra commented 5 years ago

Still debuggin an issue for single GPU. Should have it resolved over this long weekend. Wanted to keep you guys posted.

karansapra commented 5 years ago

Hi @sde123 @bryanyzhu Could you try this out. This should support single gpu sliding window inference. Its slow but it should work. https://github.com/NVIDIA/semantic-segmentation/tree/sliding_inference_single_gpu

karansapra commented 5 years ago

@sde123 Let me know if you are seeing any issues. :)

bryanyzhu commented 5 years ago

@karansapra I still have the GPU OOM issue when using a single gpu.

I'm using the command below, does this look ok?

#!/usr/bin/env bash
echo "Running inference on" ${1}
echo "Saving Results :" ${2}
sleep 10
PYTHONPATH=$PWD:$PYTHONPATH CUDA_VISIBLE_DEVICES=0 python eval.py \
    --dataset cityscapes \
    --arch network.deepv3.DeepWV3Plus \
    --inference_mode sliding \
    --scales 0.5,1.0,2.0 \
    --split val \
    --cv_split 2 \
    --dump_images \
    --ckpt_path ${2} \
    --snapshot ${1}
karansapra commented 5 years ago

@bryanyzhu Looks Correct. Are you using the new branch? Also, how much RAM does your GPU have?

bryanyzhu commented 5 years ago

@karansapra Yes, I think so. Because if I'm not using the new branch, the sleep 10 shouldn't be in the script. But let me check again today. My GPU has 16GB.

bryanyzhu commented 5 years ago

@karansapra Yep, still OOM, confirmed using the new branch. The error log is as below:

Running inference on ../pretrained_models/cityscapes_best_wideresnet38.pth
Saving Results : ./logs/
Using regular batch norm
Logging : ./logs/../val/eval_2019_09_24_22_57_08_rank_0.log
09-24 22:57:08.038 Network Arch: network.deepv3.DeepWV3Plus
09-24 22:57:08.038 CV split: 2
09-24 22:57:08.038 Exp_name: ..
09-24 22:57:08.038 Ckpt path: ./logs/
09-24 22:57:08.038 Scales : 0.5 1.0 2.0
09-24 22:57:08.038 Inference mode: sliding
09-24 22:57:08.038 val fine cities: ['train/monchengladbach', 'train/strasbourg', 'train/stuttgart']
09-24 22:57:08.042 Cityscapes-val: 655 images
09-24 22:57:08.042 Load model file: ../pretrained_models/cityscapes_best_wideresnet38.pth
09-24 22:57:08.053 Trunk: WideResnet38
09-24 22:57:09.185 Global Average Pooling Initialized
=====================Could not load ImageNet weights=======================
Please download the ImageNet weights of WideResNet38 in our repo to ./pretrained_models.
09-24 22:57:14.505 Model params = 137.1M
09-24 22:57:15.141 Checkpoint Load Compelete
eval val:   0%|                                                                                                                                                                                          | 0/655 [00:00<?, ?it/s]Traceback (most recent call last):
  File "eval.py", line 599, in <module>
    main()
  File "eval.py", line 590, in main
    runner.inf(imgs, img_names, gt, inference, net, scales, pbar, base_img)
  File "eval.py", line 451, in inf
    prediction_pre_argmax_collection = inference(net, img, scales)
  File "eval.py", line 318, in inference_sliding
    output_scattered_by_part_1 = model(input_crops[:part_1]).cpu()
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/yizhu/code/semantic-segmentation/network/deepv3.py", line 282, in forward
    dec1 = self.final(dec0)
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 4.51 GiB (GPU 0; 15.75 GiB total capacity; 8.18 GiB already allocated; 3.00 GiB free; 3.52 GiB cached)

It's weird, so you can run this on your end with one GPU? Thanks.

karansapra commented 5 years ago

Hmm.. weird let me have a look and get back. Thanks for checking @bryanyzhu !

shecker commented 5 years ago

Any news on this issue? I am also having trouble running inference on a single GPU with a batch size greater than 1. I modified the demo.py script to run with a custom dataset. Sometimes it will be able to allocate and successfully run inference whereas other times it'll just give the cuda out of memory error. Don't know why this is happening, but I feel like the network requests some spike memory on initialization because when it goes through without crashing it only needs around 4GB. Any way to solve this memory issue? Thanks

zhangyuan1994511 commented 5 years ago

Hi authors, thank you for sharing your great work! I see the mIoU is 83.454 in cityscapes leaderboard, but I submit the test results by using "submit_cityscapes_WideResNet38.sh" and "cityscapes_best.pth" which you released and get mIoU is 83.351. I don't know why, and I check the test code don't find any problem, Could you please tell me what's the reason? looking for ward your help, thank you very much!

zhangyuan1994511 commented 5 years ago

@bryanyzhu

bryanyzhu commented 5 years ago

@zhangyuan1994511 I will double check the code to see if there is any difference. In the meantime, can you check your input as well? Maybe there are some images missing or the predictions are not saving correctly. Check if your submission contains 1525 images.

zhangyuan1994511 commented 5 years ago

Thanks for your help. I'm already check the predictions and the submission contains 1525 images, without any problems. I will submit a paper about Semantic segmentation, I'm cite your paper and compare your results in my demo. So, I need the same results as cityscapes leaderboard and your paper. Do you have any other parameters which not contain in "submit_cityscapes_WideResNet38.sh" or something need to be most careful about. Looking forward to your reply, thank you very much!

zhangyuan1994511 commented 5 years ago

@bryanyzhu

bryanyzhu commented 5 years ago

@zhangyuan1994511 I briefly checked the script, and found nothing weird. At this moment, you can use either number (83.45 or 83.35) to do the comparison and include it in your submission. I will investigate into this after CVPR deadline.

@shecker Sorry for the slow response, I will look into this issue after CVPR deadline.

shecker commented 5 years ago

@bryanyzhu Hi, no worries, I actually solved the issues. It was hardware related. It worked on a Titan Xp but not on an older X or 1080 Ti even though memory usage, once inference had started, was only at around 6GB. Could be some peak memory requirements on initialization.

bryanyzhu commented 5 years ago

@shecker Thank you for the info. Yes, I think so.

bryanyzhu commented 5 years ago

@shecker I may found the reason, please change this line to False. Then there is no peak memory usage.

For me, if I turn it on, there is a peak memory usage of 12.5G somewhere in the beginning. After I turn it off, the maximum memory usage is 5G, which is the normal usage as expected. Please try it on your side to see if this is the case. Thank you.

Right now, this only works for demo.py. I will investigate further and push a change for both demo.py and eval.py.

For eval.py, using both my previous tricks and turning off cudnn.benchmark can limit the maximum memory usage to 12.5G. But I want to make it even less, because theoretically, a forward of an 3x2048x2048 image only needs 9G memory.

@sde123 @karansapra

zhangyuan1994511 commented 5 years ago

@bryanyzhu Thank you very much for your reply and I will use the results 83.35. If you find the reason, please inform me in time. Wish you success in your work!

kwea123 commented 5 years ago

@bryanyzhu @zhangyuan1994511 I use the cityscapes_best but got 83.38... Does the evaluation method have something stochastic that makes it different each time?

Eval log:

10-16 11:50:58.333 Network Arch: network.deepv3.DeepWV3Plus
10-16 11:50:58.334 CV split: 0
10-16 11:50:58.334 Exp_name: 
10-16 11:50:58.334 Ckpt path: submit2
10-16 11:50:58.334 Scales : 0.5 1.0 2.0
10-16 11:50:58.334 Inference mode: sliding
10-16 11:50:58.476 Cityscapes-test: 1525 images
10-16 11:50:58.478 Load model file: ckpts/cityscapes_cv2_wideresnet38_sdcaug.pth
10-16 11:50:58.537 Trunk: WideResnet38
10-16 11:50:59.417 Global Average Pooling Initialized
10-16 11:51:03.999 Model params = 137.1M
10-16 11:51:14.748 Checkpoint Load Compelete
10-16 22:47:38.892 IoU:
10-16 22:47:38.892 label_id      label    iU    Precision Recall TP     FP    FN
10-16 22:47:38.893  0                0    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.893  1                1    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.893  2                2    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.893  3                3    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  4                4    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  5                5    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  6                6    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  7                7    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  8                8    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.894  9                9    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 10               10    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 11               11    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 12               12    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 13               13    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 14               14    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.895 15               15    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.896 16               16    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.896 17               17    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.896 18               18    0.00   0.00      0.00    nan    0.00    0.00
10-16 22:47:38.896 mean 0.0
zhangyuan1994511 commented 5 years ago

@kwea123 @bryanyzhu I get the same Eval log as yours, but I get the mIoU 83.35, I don't know why. I already check the eval code and don't find any something stochastic number. I wirte 83.35 in my paper, If you find the reason, please infor me in time. Thank you very much!

zhangyuan1994511 commented 5 years ago

@bryanyzhu Hi, Author, I'm sorry to trouble you during the CVPR deadline. The reviewer said that the results of my experiment(83.35) about your paper are inconsistent with the cityscapes leaderboard(83.5), and the demo which I present also exist differences. I have run the "submit_cityscapes_WideResNet38.sh" again, and get 83.3814. Is there have something stochastic that makes it different each time or tricks ? Or could you provide the test results which I can present in my demo. Thank you very much! looking for ward your help!

kwea123 commented 5 years ago

@zhangyuan1994511 83.3814 is the same as what I got above; so is there a chance that you made some mistake and 83.35 was not the result with cityscapes_best? Maybe 83.38 is the correct one and there is no stochastic thing? @bryanyzhu Anyway, we still need to figure out how to get 83.454 in the leaderboard..

bryanyzhu commented 5 years ago

@kwea123 @karansapra @zhangyuan1994511 I don't have the bandwidth to fully investigate on this at this moment, but we are trying to find the test results and share it with you asap.

zhangyuan1994511 commented 5 years ago

@bryanyzhu That's great! If you give me the 83.5 results, I can use your test results directly in my paper. Thank you very much. You have been very helpful to me!

bryanyzhu commented 5 years ago

@kwea123 @zhangyuan1994511 Here is the link to our best results, sorry for the wait.

https://drive.google.com/open?id=1W72hODzScSX1jotRfbXeYeA-bXRnKtx7

zhangyuan1994511 commented 5 years ago

@bryanyzhu Thank you so much for your kind help! It will be very helpful for me。

bryanyzhu commented 5 years ago

Originally, this issue is about OOM during evaluation on a single GPU. The issue has been resolved and will be closed for now. If you have new issues, please open another thread for clarity. Thank you.