Closed sde123 closed 5 years ago
@karansapra Could you look into this? I tried on my end and found that we need at least 3 GPUs to run the evaluation code, otherwise it will cause OOM error.
@sde123 Can I ask how many GPU cards are you using to run the evaluation? And how much memory does each card have? Thank you.
@bryanyzhu Thanks for your help. I have 2 GPUs( two GTX 1080Ti), about 10 G for each GPU, so i have about 20 G in GPU.
@bryanyzhu Thanks for you help. Is there any way to solve this problem? Must we have at least 3 GPUs to run the evaluation code?
@sde123 Looking into the issue, will let you know asap.
@bryanyzhu Thanks for your help, I do not understand why it cost so much GPU merrory. Two gtx1080Ti is still not enough
@sde123 The reason is when we do sliding based multi-scale evaluation with flipping, sometimes the input can be 6x3x2048x2048
. This will cost a lot of GPU memory. I did a simple fix that I forward this data 6 times, and each time the input will just be 1x3x2048x2048
, this works on my end with one V100 card (with 16GB memory).
I changed the lines here from
with torch.no_grad():
input_crops = Variable(input_crops.cuda())
output_scattered = model(input_crops)
to
torch.cuda.empty_cache()
with torch.no_grad():
bi, _, hi, wi = input_crops.size()
if hi >= args.crop_size:
output_scattered_list = []
for b_idx in range(bi):
cur_input = input_crops[b_idx,:,:,:].unsqueeze(0).cuda()
cur_output = model(cur_input)
output_scattered_list.append(cur_output)
output_scattered = torch.cat(output_scattered_list, dim=0)
else:
input_crops = input_crops.cuda()
output_scattered = model(input_crops)
I haven't fully tested this change yet, but I think it should work. If you still experience OOM issue, please change --scales 0.5,1.0,2.0
to --scales 0.5,1.0
.
I will continue to look into this issue as well as several efficiency issues during evaluation. I will update the repo very soon.
@bryanyzhu Thank you for your help. I run this in terminal: ./scripts/submit_cityscapes_WideResNet38.sh ./cityscapes_best.pth ./result do I need to define multi-gpu in the command? I find nothing about multi-gpu in eval.py
No, you don't need to. But it is a good point, that is also one thing I want to add to the eval.py
script.
@bryanyzhu @sde123 maybe it make sense to put back fp16 inference? I can add it to feature list/
@karansapra Good point, it is nice to have fp16 inference. Thank you for adding it.
:+1:
Still debuggin an issue for single GPU. Should have it resolved over this long weekend. Wanted to keep you guys posted.
Hi @sde123 @bryanyzhu Could you try this out. This should support single gpu sliding window inference. Its slow but it should work. https://github.com/NVIDIA/semantic-segmentation/tree/sliding_inference_single_gpu
@sde123 Let me know if you are seeing any issues. :)
@karansapra I still have the GPU OOM issue when using a single gpu.
I'm using the command below, does this look ok?
#!/usr/bin/env bash
echo "Running inference on" ${1}
echo "Saving Results :" ${2}
sleep 10
PYTHONPATH=$PWD:$PYTHONPATH CUDA_VISIBLE_DEVICES=0 python eval.py \
--dataset cityscapes \
--arch network.deepv3.DeepWV3Plus \
--inference_mode sliding \
--scales 0.5,1.0,2.0 \
--split val \
--cv_split 2 \
--dump_images \
--ckpt_path ${2} \
--snapshot ${1}
@bryanyzhu Looks Correct. Are you using the new branch? Also, how much RAM does your GPU have?
@karansapra Yes, I think so. Because if I'm not using the new branch, the sleep 10
shouldn't be in the script. But let me check again today. My GPU has 16GB.
@karansapra Yep, still OOM, confirmed using the new branch. The error log is as below:
Running inference on ../pretrained_models/cityscapes_best_wideresnet38.pth
Saving Results : ./logs/
Using regular batch norm
Logging : ./logs/../val/eval_2019_09_24_22_57_08_rank_0.log
09-24 22:57:08.038 Network Arch: network.deepv3.DeepWV3Plus
09-24 22:57:08.038 CV split: 2
09-24 22:57:08.038 Exp_name: ..
09-24 22:57:08.038 Ckpt path: ./logs/
09-24 22:57:08.038 Scales : 0.5 1.0 2.0
09-24 22:57:08.038 Inference mode: sliding
09-24 22:57:08.038 val fine cities: ['train/monchengladbach', 'train/strasbourg', 'train/stuttgart']
09-24 22:57:08.042 Cityscapes-val: 655 images
09-24 22:57:08.042 Load model file: ../pretrained_models/cityscapes_best_wideresnet38.pth
09-24 22:57:08.053 Trunk: WideResnet38
09-24 22:57:09.185 Global Average Pooling Initialized
=====================Could not load ImageNet weights=======================
Please download the ImageNet weights of WideResNet38 in our repo to ./pretrained_models.
09-24 22:57:14.505 Model params = 137.1M
09-24 22:57:15.141 Checkpoint Load Compelete
eval val: 0%| | 0/655 [00:00<?, ?it/s]Traceback (most recent call last):
File "eval.py", line 599, in <module>
main()
File "eval.py", line 590, in main
runner.inf(imgs, img_names, gt, inference, net, scales, pbar, base_img)
File "eval.py", line 451, in inf
prediction_pre_argmax_collection = inference(net, img, scales)
File "eval.py", line 318, in inference_sliding
output_scattered_by_part_1 = model(input_crops[:part_1]).cpu()
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/yizhu/code/semantic-segmentation/network/deepv3.py", line 282, in forward
dec1 = self.final(dec0)
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/anaconda3/envs/nvseg_pth10_py37_cuda10/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 4.51 GiB (GPU 0; 15.75 GiB total capacity; 8.18 GiB already allocated; 3.00 GiB free; 3.52 GiB cached)
It's weird, so you can run this on your end with one GPU? Thanks.
Hmm.. weird let me have a look and get back. Thanks for checking @bryanyzhu !
Any news on this issue? I am also having trouble running inference on a single GPU with a batch size greater than 1. I modified the demo.py script to run with a custom dataset. Sometimes it will be able to allocate and successfully run inference whereas other times it'll just give the cuda out of memory error. Don't know why this is happening, but I feel like the network requests some spike memory on initialization because when it goes through without crashing it only needs around 4GB. Any way to solve this memory issue? Thanks
Hi authors, thank you for sharing your great work! I see the mIoU is 83.454 in cityscapes leaderboard, but I submit the test results by using "submit_cityscapes_WideResNet38.sh" and "cityscapes_best.pth" which you released and get mIoU is 83.351. I don't know why, and I check the test code don't find any problem, Could you please tell me what's the reason? looking for ward your help, thank you very much!
@bryanyzhu
@zhangyuan1994511 I will double check the code to see if there is any difference. In the meantime, can you check your input as well? Maybe there are some images missing or the predictions are not saving correctly. Check if your submission contains 1525 images.
Thanks for your help. I'm already check the predictions and the submission contains 1525 images, without any problems. I will submit a paper about Semantic segmentation, I'm cite your paper and compare your results in my demo. So, I need the same results as cityscapes leaderboard and your paper. Do you have any other parameters which not contain in "submit_cityscapes_WideResNet38.sh" or something need to be most careful about. Looking forward to your reply, thank you very much!
@bryanyzhu
@zhangyuan1994511 I briefly checked the script, and found nothing weird. At this moment, you can use either number (83.45 or 83.35) to do the comparison and include it in your submission. I will investigate into this after CVPR deadline.
@shecker Sorry for the slow response, I will look into this issue after CVPR deadline.
@bryanyzhu Hi, no worries, I actually solved the issues. It was hardware related. It worked on a Titan Xp but not on an older X or 1080 Ti even though memory usage, once inference had started, was only at around 6GB. Could be some peak memory requirements on initialization.
@shecker Thank you for the info. Yes, I think so.
@shecker I may found the reason, please change this line to False
. Then there is no peak memory usage.
For me, if I turn it on, there is a peak memory usage of 12.5G somewhere in the beginning. After I turn it off, the maximum memory usage is 5G, which is the normal usage as expected. Please try it on your side to see if this is the case. Thank you.
Right now, this only works for demo.py
. I will investigate further and push a change for both demo.py
and eval.py
.
For eval.py
, using both my previous tricks and turning off cudnn.benchmark
can limit the maximum memory usage to 12.5G. But I want to make it even less, because theoretically, a forward of an 3x2048x2048 image only needs 9G memory.
@sde123 @karansapra
@bryanyzhu Thank you very much for your reply and I will use the results 83.35. If you find the reason, please inform me in time. Wish you success in your work!
@bryanyzhu @zhangyuan1994511
I use the cityscapes_best
but got 83.38... Does the evaluation method have something stochastic that makes it different each time?
Eval log:
10-16 11:50:58.333 Network Arch: network.deepv3.DeepWV3Plus
10-16 11:50:58.334 CV split: 0
10-16 11:50:58.334 Exp_name:
10-16 11:50:58.334 Ckpt path: submit2
10-16 11:50:58.334 Scales : 0.5 1.0 2.0
10-16 11:50:58.334 Inference mode: sliding
10-16 11:50:58.476 Cityscapes-test: 1525 images
10-16 11:50:58.478 Load model file: ckpts/cityscapes_cv2_wideresnet38_sdcaug.pth
10-16 11:50:58.537 Trunk: WideResnet38
10-16 11:50:59.417 Global Average Pooling Initialized
10-16 11:51:03.999 Model params = 137.1M
10-16 11:51:14.748 Checkpoint Load Compelete
10-16 22:47:38.892 IoU:
10-16 22:47:38.892 label_id label iU Precision Recall TP FP FN
10-16 22:47:38.893 0 0 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.893 1 1 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.893 2 2 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.893 3 3 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 4 4 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 5 5 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 6 6 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 7 7 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 8 8 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.894 9 9 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 10 10 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 11 11 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 12 12 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 13 13 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 14 14 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.895 15 15 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.896 16 16 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.896 17 17 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.896 18 18 0.00 0.00 0.00 nan 0.00 0.00
10-16 22:47:38.896 mean 0.0
@kwea123 @bryanyzhu I get the same Eval log as yours, but I get the mIoU 83.35, I don't know why. I already check the eval code and don't find any something stochastic number. I wirte 83.35 in my paper, If you find the reason, please infor me in time. Thank you very much!
@bryanyzhu Hi, Author, I'm sorry to trouble you during the CVPR deadline. The reviewer said that the results of my experiment(83.35) about your paper are inconsistent with the cityscapes leaderboard(83.5), and the demo which I present also exist differences. I have run the "submit_cityscapes_WideResNet38.sh" again, and get 83.3814. Is there have something stochastic that makes it different each time or tricks ? Or could you provide the test results which I can present in my demo. Thank you very much! looking for ward your help!
@zhangyuan1994511 83.3814 is the same as what I got above; so is there a chance that you made some mistake and 83.35 was not the result with cityscapes_best
? Maybe 83.38 is the correct one and there is no stochastic thing?
@bryanyzhu Anyway, we still need to figure out how to get 83.454 in the leaderboard..
@kwea123 @karansapra @zhangyuan1994511 I don't have the bandwidth to fully investigate on this at this moment, but we are trying to find the test results and share it with you asap.
@bryanyzhu That's great! If you give me the 83.5 results, I can use your test results directly in my paper. Thank you very much. You have been very helpful to me!
@kwea123 @zhangyuan1994511 Here is the link to our best results, sorry for the wait.
https://drive.google.com/open?id=1W72hODzScSX1jotRfbXeYeA-bXRnKtx7
@bryanyzhu Thank you so much for your kind help! It will be very helpful for me。
Originally, this issue is about OOM during evaluation on a single GPU. The issue has been resolved and will be closed for now. If you have new issues, please open another thread for clarity. Thank you.
Hi authors, thank you for sharing your great work! I meet a problem need your help when I run your code(./scripts/submit_cityscapes_WideResNet38.sh ./cityscapes_best.pth ./result):
RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.91 GiB total capacity; 8.29 GiB already allocated; 1.06 GiB free; 877.60 MiB cached)
Could you please tell me how to solve it? looking for ward your help, thank you very much!