Closed NormXU closed 1 year ago
Wonderful and grateful! Thank you for the effort made. I have briefly read the results of your translation, found no technical errors, and even think it's written better than my original article!
By the way, the experiments in the blog were conducted on a Transformer model with 100 million parameters and a GAU (Gated Attention Unit) architecture. This is my small model for quick experiments, it doesn't have much academic value, so it's not open source.
Regarding GAU, you can refer to: https://arxiv.org/abs/2202.10447
If you permit, I will add your English version link to the README page.
@bojone No problem. My pleasure :)
Thank you very much for sharing your awesome work!
As the blogs mentioned in README are in Chinese, I am working on translating them into English. I guarantee you I am not using any AI translations. I believe people working on expanding context length will love these blogs and draw inspirations from them.
I have finished some parts, and for people who can't read Chinese, please check