HenryHZY / VL-PET

[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
MIT License
53 stars 1 forks source link

About Multi-head Modular Modification #3

Closed 123456789asdfjkl closed 1 year ago

123456789asdfjkl commented 1 year ago

您好!非常感谢您的杰出工作!您设计这个多头机制根据分配律从数学上跟直接使用 \mathbf{W} \in \mathbb{R}^{d \times {r}}是一样的,想问一下您这个修改是跟优化有关吗,可能更适合做梯度下降?

HenryHZY commented 1 year ago

Hi @123456789asdfjkl Actually, the proposal of the Multi-head Modular Modification is motivated by our empirical studies. Our experiments in Figure 4 demonstrates the benefits of Multi-head Modular Modification over Single-head Modular Modification. If you are interested in delving deeper into the benefits of Multi-head, I think you can gain some valuable insights from some existing analyses on the benefits of utilizing the multi-head attention mechanism in Transformer.

123456789asdfjkl commented 1 year ago

好的,感谢您的解答

HenryHZY commented 1 year ago

Thanks for you issue. I think we can close this issue now:)