-
hello,
Could someone enlighten me with the rational behind this line of code, i.e. why "_1 - 2 ** (-5 -_" etc.? Thank you,
https://github.com/microsoft/torchscale/blob/881d03079da7b0c52ba0a473…
-
I believe there should be some type of normalization mistake in chunk recurrent retention. Output of it does not match the ouput of a simple recurrent and parallel retention. Recurrent retention also …
-
Thanks for the well-written package! The RetNet's official implementation had several updates at https://github.com/microsoft/unilm/blob/master/retnet/README.md#changelog .
-
I am running scripts/benchmark_inference on the CPU (on a Mac M2 with Ventura OS). There are several issues with the code:
Could you please run the code on the CPU with a version of Torch which does…
-
In the retnet code,
https://github.com/microsoft/torchscale/blob/main/torchscale/architecture/retnet.py#L25
this creates `inv_freq` (`angle` in this code) using `torch.linspace(0, 1, dim/2)`.
but…
-
I was training RetNet model using your codebase.
But I found there's no initialization of word embedding layers.
So the loss scale was very poor. (7B model's initial loss was 3000+)
I think we need…
-
I am running on the mac, and like the clarity of your code. My device is 'cpu', since I don't have Cuda on a mac notebook. Running `retnet.py` works fine. However, when running `train_project_gutenbur…
-
Hi, Thank you for your great work. The Retnet version you provided is the easiest to understand and clear to understand version I have ever seen.
Have you ever tried using retnet module for visual t…
-
https://www.zhihu.com/question/612761391/answer/3128755930
看到知乎上的回答,后续还会支持retnet么。
-
Hi authors,
Thanks for the great work. I am very interested in your work and would like to give the RegBench and those sequence models a try. I wonder if you could further elaborate on how to repro…