只使用dw conv的实验结果

LeapLabTHU / FLatten-Transformer

Official repository of FLatten Transformer (ICCV2023)

377 stars 21 forks source link

只使用dw conv的实验结果 #3

Closed Asthestarsfalll closed 1 year ago

Asthestarsfalll commented 1 year ago

感谢你们出色的工作！注意到加了dw conv之后性能有十分显著的提升，请问有单独使用dw conv实验的结果吗？这篇工作中提到将注意力模块替换为dw conv可以达到与原本相近甚至更好的效果，所以我很好奇dw conv在flatten中所起的作用。

tian-qing001 commented 1 year ago

Hi @Asthestarsfalll, we genuinely value your interest in and recognition of our work. I'd like to inquire if you were specifically referring to using vanilla linear attention + depthwise convolution without incorporating the focusing function?

Asthestarsfalll commented 1 year ago

Hi @tian-qing001, thank you for your reply. What I mean is replacing vanilla attention with depthwise convolution. At first I'm curious about which portion is dominant in flatten. But upon reviewing the paper I mentioned above, I realized that I had misremembered the details. In addition to replacing vanilla attention with dwconv, the paper also includes some extra layers.

2023-09-04_12-32