bojone / rerope

Rectified Rotary Position Embeddings
330 stars 27 forks source link

Blogs in English #7

Closed NormXU closed 1 year ago

NormXU commented 1 year ago

Thank you very much for sharing your awesome work!

As the blogs mentioned in README are in Chinese, I am working on translating them into English. I guarantee you I am not using any AI translations. I believe people working on expanding context length will love these blogs and draw inspirations from them.

I have finished some parts, and for people who can't read Chinese, please check

bojone commented 1 year ago

Wonderful and grateful! Thank you for the effort made. I have briefly read the results of your translation, found no technical errors, and even think it's written better than my original article!

By the way, the experiments in the blog were conducted on a Transformer model with 100 million parameters and a GAU (Gated Attention Unit) architecture. This is my small model for quick experiments, it doesn't have much academic value, so it's not open source.

Regarding GAU, you can refer to: https://arxiv.org/abs/2202.10447

bojone commented 1 year ago

If you permit, I will add your English version link to the README page.

NormXU commented 1 year ago

@bojone No problem. My pleasure :)