A positional encoding is used instead of the forget gate

LeapLabTHU / MLLA

Official repository of MLLA (NeurIPS 2024)

179 stars 6 forks source link

A positional encoding is used instead of the forget gate #12

Closed LQchen1 closed 3 months ago

LQchen1 commented 3 months ago

hello,I notice in this table that positional encoding-without the forgetting gate gives a significant performance boost to the model, with ape alone (80.0%) +1.6% compared to the forgetting gate (78.4%); lepe only (81.6%) +3.2%; Only adding cpe (81.7%) +3.3%; Only rope (80.0%) +1.6% is added. How do you choose them? In your code, you added cpe+rope+lepe, but not ape. Have you done any more specific ablation experiments?

leily578 commented 1 month ago

Hello, I am also troubled by the same problem. What do you think about the choice of positional encodings?