-
Hi,
Thank you for your great work!
We are intersted in the ability of RetNet. However, when we look through this repository, we can't find the code correspond to the paper's experiments. For example…
-
[xPos](https://arxiv.org/pdf/2212.10554.pdf) is an improved version of the original RoPE from the [RoFormer](https://arxiv.org/pdf/2104.09864.pdf) paper (i.e. a modification of ggml_rope with !is_neox…
-
### Thank you for reproducing retnet!
#### However, when I actually run the code, I find that the training is slow, 5-6 times slower for the same task compared to transformer (transformer uses half…
-
Is it possible to train a RetNet using Transformers?
-
For example:
prompts = ["My dog is cute.", "My cat is very cute.", "Both my cat and dog are very cute."]
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(prompts, return_tensors…
-
I tried to run your examples on this page: https://github.com/syncdoth/RetNet
running of the first example, I got:
File "RetNet/retnet/modeling_retnet.py", line 114, in recurrent_retention
cu…
-
It seems the recurrent and parallel forward results are quite inconsistent for multiscale retention in the RetNet code. By debugging around for a while, it seems these three lines are quite weird.
…
-
My question is about the [RetNet paper](https://arxiv.org/pdf/2307.08621v3.pdf), which leads to the implementation here...
Why include the positional embedding updates directly in the multi-scale r…
-
Hey,
Does the implementation support Multiscale Retention in parallel mode? I did see that multiple heads are a input hyper parameter but am not able to understand if MSR is completely implemented? T…
-
I would like to use this repo for my job. I cannot do so until you add a license to the repo. Can you please do so soon?