[reproducibility] Training unnas_citys_jig with 4 GPUs

facebookresearch / unnas

Code for "Are labels necessary for neural architecture search"

MIT License

92 stars 15 forks source link

[reproducibility] Training unnas_citys_jig with 4 GPUs #5

Open xdeng7 opened 4 years ago

xdeng7 commented 4 years ago

I test the evaluation on unnas_citys_jig on segmentation task using cityscapes data. But I can only produce the mIoU of 0.102. I am using 4 GPUs and didn't change any other parameters in the default configuration file.

chenxi116 commented 4 years ago

Hi. Since you are using half the number of GPUs, I suggest also halving the learning rate as well as the two batch sizes. Note that this may still result in some degradation, since semantic segmentation is known to prefer larger batch size for better BN statistics.

Also I am a bit surprised the original batch size can fit into half the number of GPUs: what's your GPU memory size? And what's the approximate wall clock time of your trial using 4 GPUs?

xdeng7 commented 4 years ago

My GPU memory is 16GB. The training time is around a week using 4 GPUs.

xdeng7 commented 4 years ago

I am interested in using the searched architecture for some other applications. Is there any searched model pretrained in ImageNet provided to download? Any model with imagenet pretrained weights would help.

larenzhang commented 4 years ago

Hi, I have similar problems. I train Imagenet_jig genotypes on cityscapes with 8 GPUS. I use the default training setting as imagenet_jig.yaml except that I modify the batch size to 32 and the learning rate to 0.05 because of the limitation of memory 12G. I have trained the model with 770 epochs and the training mIOU is 0.68 but the test mIOU is about 0.1. It seems not normal results. So can you give me some tips to solve this problem?

xdeng7 commented 4 years ago

Hi, I have similar problems. I train Imagenet_jig genotypes on cityscapes with 8 GPUS. I use the default training setting as imagenet_jig.yaml except that I modify the batch size to 32 and the learning rate to 0.05 because of the limitation of memory 12G. I have trained the model with 770 epochs and the training mIOU is 0.68 but the test mIOU is about 0.1. It seems not normal results. So can you give me some tips to solve this problem?

I have the same problem. I have no idea is there anything wrong during the training or validation.

AlbertiPot commented 2 years ago

Hi, I have similar problems. I train Imagenet_jig genotypes on cityscapes with 8 GPUS. I use the default training setting as imagenet_jig.yaml except that I modify the batch size to 32 and the learning rate to 0.05 because of the limitation of memory 12G. I have trained the model with 770 epochs and the training mIOU is 0.68 but the test mIOU is about 0.1. It seems not normal results. So can you give me some tips to solve this problem?

I have the same problem. I have no idea is there anything wrong during the training or validation.

The same problem, when the training miou improves to above 0.8, the test miou stays at around 0.1

https://github.com/facebookresearch/unnas/issues/8#issue-1347186404