About Multi-head Modular Modification

HenryHZY / VL-PET

[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"

MIT License

53 stars 1 forks source link

About Multi-head Modular Modification #3

Closed 123456789asdfjkl closed 1 year ago

123456789asdfjkl commented 1 year ago

您好！非常感谢您的杰出工作！您设计这个多头机制根据分配律从数学上跟直接使用 \mathbf{W} \in \mathbb{R}^{d \times {r}}是一样的，想问一下您这个修改是跟优化有关吗，可能更适合做梯度下降？

HenryHZY commented 1 year ago

Hi @123456789asdfjkl Actually, the proposal of the Multi-head Modular Modification is motivated by our empirical studies. Our experiments in Figure 4 demonstrates the benefits of Multi-head Modular Modification over Single-head Modular Modification. If you are interested in delving deeper into the benefits of Multi-head, I think you can gain some valuable insights from some existing analyses on the benefits of utilizing the multi-head attention mechanism in Transformer.

123456789asdfjkl commented 1 year ago

好的，感谢您的解答

HenryHZY commented 1 year ago

Thanks for you issue. I think we can close this issue now:)