allenai / elastic

Apache License 2.0
91 stars 9 forks source link

Instance specific? #4

Closed yukang2017 closed 5 years ago

yukang2017 commented 5 years ago

Hi,

Thanks for sharing the good idea and code.

I do not understand how the network structure is instance specific or learned from data. What I see is different data/image has different activation distribution.

For example, DenseNet block with Elastic as following. It seems a fixed architecture and data-independent. image

From the code, it also seems a fixed architecture. https://github.com/allenai/elastic/blob/57345c600c63fbde163c41929d6d6dd894d408ce/models/densenet.py#L9

Would you please provide some explanation?

Thanks in advance!

Best Yukang

csrhddlam commented 5 years ago

Hi Yukang,

Thanks for your interest in our work.

It is correct that Elastic has a fixed and data-independent architecture. It does not learn explicit policies, like a hard selection of which layer or filter to run. However, at each layer, we provide more than one resolution paths, which entail a model that can focus more or less on one scaling path at each layer. The "different activation distributions" observed and shown in section 4.1.1 implicitly resembles a soft dynamic scaling policy, like soft gating (vs. hard gating) or soft attention (vs. hard indexing). But I think it would also be interesting and promising to extend this work with techniques studied in "Convolutional Networks with Adaptive Inference Graphs" or "You Look Twice: GaterNet for Dynamic Filter Selection in CNNs".

Best, Huiyu

yukang2017 commented 5 years ago

Thanks for your explanation.

ngchc commented 5 years ago

Thanks for sharing the code and insightful idea.

Yet, in my opinion, the above explanation is really confusing. Since it is a data-independent architecture, in what aspect can the characteristic of "instance specific" (or "dynamic scale policy") be achieved?

Specifically, in Figure 1, the detailed description of "Elastic" is "allowing different scaling policies for different input images". While in the provided implementation, the scaling policy for different input images is still a fixed one (that is a half of down&up + a half of ordinary conv. stack).