-
While working on the HF_tokenizers branch, I found that I could get the validation to skip triggering at the 33% mark:
![image](https://github.com/DRAGNLabs/301r_retnet/assets/10404106/ffeab80e-7f0…
-
I am trying to use GLABlock with batch size 1, but encounter this error. How can I resolve this?
My current config:
```
config = GLAConfig(
hidden_size=channels,
num_hidden_layers=n_…
-
Hi,
Thanks for this great work! I wonder if you could provide a wrapper for a full language model class, like in Mamba and RetNet they have `MambaLMHeadModel` and `RetNetDecoder`. Thanks a lot!
-
Hello,
I am interested in training a model using the chunkwise feature of RetNet to handle long sequences.However, I couldn't find detailed instructions on how to do this in the documentation.
Cou…
-
Currently, a single value of alpha is used in retention. However, a ViR should use different alphas for each head. This makes it such that some heads are more focused in the "local" / "more recent" to…
-
### Problem Description
While running `lm_eval.simple_eval(...)`, I'm getting the following error:
```AttributeError: 'CausalLMOutputWithPast' object has no attribute 'log_softmax'``` (Full trace…
-
In [the RetNet paper](https://arxiv.org/abs/2307.08621), equation (3) is simplified by making $\gamma$ a scalar, leading to the derivation of the following:
$$
o_n = \sum_{m=1}^{n} \gamma^{n-m} (Q…
-
I would like to know some applications and use cases of RetNet in LLMs, as it's currently difficult to find models and code that use RetNet for pretraining from scratch on the internet. Almost none of…
-
![image](https://github.com/berlino/gated_linear_attention/assets/139205286/e4122d60-24bf-4557-b8b9-8f31aea38ce0)
I wanted to get St but I couldn't find his exact location, I returned memory_cache …
JL-er updated
8 months ago
-
Hello,
I've been playing with this architecture on [nanoGPT](https://github.com/karpathy/nanoGPT/tree/master). While I can get other architectures to play nicely there (e.g. [RMT](https://github.co…