VITA-Group / FasterSeg

[ICLR 2020] "FasterSeg: Searching for Faster Real-time Semantic Segmentation" by Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang
MIT License
525 stars 107 forks source link

Where does weight sharing takes place in model_search.py? #22

Closed maaft closed 4 years ago

maaft commented 4 years ago

In your paper you write:

It is worth noting that, the multi-resolution branches will share both cell weights and feature maps if their cells are of the same operator type, spatial resolution, and expansion ratio. This design contributes to a faster network. Once cells in branches diverge, the sharing between the branches will be stopped and they become individual branches (See Figure 6).

This gives the impression that cells actually can choose their downsampling-rate (s = [8, 16, 32] ) freely. But this is not reflected in your code, where you initialize each cell at a fixed downsampling-rate: https://github.com/TAMU-VITA/FasterSeg/blob/bb52a004ff83f64d3dd8989104234fdb862d1cc5/search/model_search.py#L153

Furthermore, when cells are placed at fixed downsampling-rates I don't get how weight sharing according to your paper (above quote) between branches can ever take place as the spatial resolutions always differ between different branches.

tl;dr: Can you elaborate on how you can fuse cells from different branches even when they operate always (according to your code) on different spatial resolutions?

chenwydj commented 4 years ago

Hi @maaft

For cell at each position (blue node in Figure 1), it contains two operators: they take the same input, but output with different stride. These two operators do not share weight.

https://github.com/TAMU-VITA/FasterSeg/blob/master/search/model_search.py#L121

maaft commented 4 years ago

Yes, I understand that but what is this "weight sharing" you mentioned in your paper than and where does it happen?

chenwydj commented 4 years ago

Weight sharing means the subnetworks leverage the same suite of supernet's weights for proxy inference, instead of training each subnetwork from scratch during search. It was first proposed in ENAS. You can find more papers specifically discuss weight sharing on Arxiv. Typical NAS methods like DARTS/ENAS/One-shot all leverage weight sharing.

Hope that helps.

maaft commented 4 years ago

Okay, thank you very much for clarifying! You have been very kind and helpful!