ashaw596 / squeezenas

MIT License
68 stars 9 forks source link

Training code for squeezenas #4

Open UsedToBe97 opened 4 years ago

UsedToBe97 commented 4 years ago

Hi, I found the paper for this repo really interesting, and I wonder if the author could be kind enough to share the training code with me so that I can reproduce the result?

ashaw596 commented 4 years ago

Yes, we intend to release the training code in the future. I'm unsure about the timeline since we need to clean it up first. Not sure if it's helpful, but some (undocumented) cleaned up utilities for supernet style NAS is available here https://github.com/ashaw596/supernet-nas

Feel free to contact me if you have questions!

UsedToBe97 commented 4 years ago

Thanks, Albert : ) But I still have some questions about the MobileNetV3 baseline. Google reported that the full resolution for CityScapes image is 1024x2048, do you have an idea what the exact preprocessing they use? I am new in this area and do not have much experience in how to do transfer learning on semantic segmentation tasks. I am planning to use MobileNetV3 as the backbone to do transfer learning (from a pretrained ImageNet model, applying to semantic segmentation framework and finetune the weight). Have you reproduce the MobileNetV3 baseline and could you please give some suggestions for how to do such transfer learning?

Also, what is the image size when training? Since LR-ASPP uses an AvgPool (49, 49) with stride [16, 20], small image size may lead to error.

ashaw596 commented 4 years ago

Unfortunately, I may not be able to help very much with replicating MobileNetV3. We didn't end up retraining it and only replicated the network for benchmarking purposes. I'm not exactly sure. My only knowledge is I believe that for evaluation they did single-scale.

For our networks, we trained with 768x768 crops, Horizontalflipping, and Random scaling between 0.75-1.25

UsedToBe97 commented 4 years ago

Unfortunately, I may not be able to help very much with replicating MobileNetV3. We didn't end up retraining it and only replicated the network for benchmarking purposes. I'm not exactly sure. My only knowledge is I believe that for evaluation they did single-scale.

For our networks, we trained with 768x768 crops, Horizontalflipping, and Random scaling between 0.75-1.25

I tried the 768x768 input, but the AvgPool (49x49 stride=[16, 20]) layer will receive input with size [48, 48], which leads to error. Could you please tell me how to deal with that?

ashaw596 commented 4 years ago

I'm not completely sure. I believe that when we used it, it just treated that as global pooling and didn't error. I believe we used the same head where. self.avgpool = torch.nn.AvgPool2d(kernel_size=49, stride=[16,20], count_include_pad=False)