Question about arbitrary width in a universally slimmable network.

JiahuiYu / slimmable_networks

Slimmable Networks, AutoSlim, and Beyond, ICLR 2019, and ICCV 2019

Other

914 stars 131 forks source link

Question about arbitrary width in a universally slimmable network. #36

Closed sseung0703 closed 4 years ago

sseung0703 commented 4 years ago

Hi, thank you for your great works. :) I have a question about arbitrary width in a universally silmmable network. In your paper, you mentioned that you sample a random width ratio for each sub-network, which is worthy than your previous works because it is not discrete. However, I think that the universally slimmable network has still discrete with a fine step (0.025).

Can you explain why you didn't use a continuous random ratio?

JiahuiYu commented 4 years ago

It is continuous. Please check the code: https://github.com/JiahuiYu/slimmable_networks/blob/4bb2a623f02a183fe08a5b7415338f148f46b363/train.py#L414

Don't look at the yaml file as they are used for inference, i.e., we show some sub-networks with finite step 0.025.

sseung0703 commented 4 years ago

Thank you for the rapid reply. I missed that line. Thanks. :)

JiahuiYu commented 4 years ago

No problem! Usually it takes longer and longer to get the reply on my GitHub issues as I graduated and started to work full-time. I was trying to clean up the issues just now. :)

sseung0703 commented 4 years ago

So good news to me :D. Now I'm struggling to implement your work via TF, and your reply will be very helpful to me.

I have another question about an arbitrary width. In my understanding, you use only one width ratio for each sub-network in the training phase because the default value of FLAGS.nonuniform is False. I wonder that do you ever try to use a fully arbitrary width ratio for each layer?

JiahuiYu commented 4 years ago

Implementing with TF will cost a lot of time as TF uses static graph.

sseung0703 commented 4 years ago

I already implemented Autoslim with TF2, but I just want to make sure all the configuration is right. In the case of MobileNetv2 on CIFAR10, the overall training time is less than 4 hour on a single GPU, which look not so heavy.

If you want to visit my repository you can find it in the below link :). https://github.com/sseung0703/Autoslim_TF2

By the way, would you kindly reply to my above question?

JiahuiYu commented 4 years ago

@sseung0703 nonuniform is true for AutoSlim. See branch v3.0.0.

sseung0703 commented 4 years ago

Thanks, I found what you said :).