MenghaoGuo / AutoDeeplab

Pytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation
https://arxiv.org/abs/1901.02985
410 stars 97 forks source link

about the crop image size for input #33

Closed Randylcy closed 5 years ago

Randylcy commented 5 years ago

in the original code, the crop size is set for 224, while the problem-"out of memory" will take place in my GPU device——Tesla V100- 32G, have any one meet this problem. Now, I make the crop size 128, and it works. And I think for esmantic segmentation, should we not crop the image size too small? look forward to good answers.

HankKung commented 5 years ago

I adopt batch size 4 and crop size 224 with the same device as yours. Did you try spec on the paper (batch size 2 and crop size 320)?

Randylcy commented 5 years ago

I adopt batch size 4 and crop size 224 with the same device as yours. Did you try spec on the paper (batch size 2 and crop size 320)?

why not we chat in wechat? 18463102232

Randylcy commented 5 years ago

I adopt batch size 4 and crop size 224 with the same device as yours. Did you try spec on the paper (batch size 2 and crop size 320)?

I have the out of memory problem when I want to enlarge the input size, maybe because I did something with the original code, and I'd better try ur code.

HankKung commented 5 years ago

I adopt batch size 4 and crop size 224 with the same device as yours. Did you try spec on the paper (batch size 2 and crop size 320)?

I have the out of memory problem when I want to enlarge the input size, maybe because I did something with the original code, and I'd better try ur code.

I assume it's because you set num_channel 40! Yes, it is set 40 in the experiment of the paper but they certainly have done some optimization for the code.

Randylcy commented 5 years ago

I adopt batch size 4 and crop size 224 with the same device as yours. Did you try spec on the paper (batch size 2 and crop size 320)?

I have the out of memory problem when I want to enlarge the input size, maybe because I did something with the original code, and I'd better try ur code.

I assume it's because you set num_channel 40! Yes, it is set 40 in the experiment of the paper but they certainly have done some optimization for the code.

Yes! U R right, I verified it. And I find a very interesting thing that google just use one P100 GPU which only has 16G. search for 3 days. While we use V100, 32G , can not have the same input size and channel numbers with them, so strange. Maybe , it is about the codes. We must optimize the codes!

HankKung commented 5 years ago

After splitting training dataset for weight and arch, the training time should be normal (3~4 days under batch size 2) which I've merged the new code into master.

Still, the bad GPU memory cost is a problem. I've consulted the paper author about this, it seems he didn't do any special optimization for the code and he suggested to try to enable cudnn benchmark which I tried but no much help.

Linfengscat commented 5 years ago

Same problem, I can only use batch size 1 and crop size 224 on Tesla V100- 32G. Which makes no sense since the paper claim that they can finish it in 3 P-100 days. And I think that the bottleneck always lies on the network structure decided by channel numbers and crop size. It has very small space to optimize because there is too many connection and operation in the net.