Does TAS apply weight sharing?

D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.

MIT License

1.56k stars 281 forks source link

Does TAS apply weight sharing? #29

Closed Hanson13 closed 4 years ago

Hanson13 commented 4 years ago

Hi, Thank you for your excellent work! While searching for width, I wonder whether the operations with different channel numbers share the weight from the one-shot network? If yes, how does the weight in the one-shot network get updated, since it has K(K=2 in your framework figure) gradients separately from the K sampled operation? Btw, "When τ-> 0, p^ = [^ p1; :::; p^j; :::] becomes one-shot, and the Gumbel-softmax distribution...", do you mean "one-hot" instead of "one-shot"?

D-X-Y commented 4 years ago

1, yes. They share the weights, please refer to https://github.com/D-X-Y/NAS-Projects/blob/master/lib/models/searchs/SearchCifarResNet_width.py#L81 .

2, Yes, it is a type and should be one-hot