retnet Search Results - Githubissues

165 results
for retnet

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

DRAGNLabs/301r_retnet #14

Update Validation Trigger Equation

While working on the HF_tokenizers branch, I found that I could get the validation to skip triggering at the 33% mark: ![image](https://github.com/DRAGNLabs/301r_retnet/assets/10404106/ffeab80e-7f0…

nprisbrey updated 9 months ago
1
sustcsonglin/flash-linear-attention #22

AssertionError('All values in both first input shape ([const…

I am trying to use GLABlock with batch size 1, but encounter this error. How can I resolve this? My current config: ``` config = GLAConfig( hidden_size=channels, num_hidden_layers=n_…

yxchng updated 5 months ago
17
berlino/gated_linear_attention #4

A Full LM class

Hi, Thanks for this great work! I wonder if you could provide a wrapper for a full language model class, like in Mamba and RetNet they have `MambaLMHeadModel` and `RetNetDecoder`. Thanks a lot!

Cranial-XIX updated 10 months ago
2
fkodom/yet-another-retnet #23

How to train with long sequences using chunkwise feature of …

Hello, I am interested in training a model using the chunkwise feature of RetNet to handle long sequences.However, I couldn't find detailed instructions on how to do this in the documentation. Cou…

calliope-pro updated 10 months ago
8
BrianPulfer/vision-retention-networks #8

Multiple Alphas

Currently, a single value of alpha is used in retention. However, a ViR should use different alphas for each head. This makes it such that some heads are more focused in the "local" / "more recent" to…

BrianPulfer updated 10 months ago
1
EleutherAI/lm-evaluation-harness #1623

Issue with HF models not finding logits for log_softmax() du…

### Problem Description While running `lm_eval.simple_eval(...)`, I'm getting the following error: ```AttributeError: 'CausalLMOutputWithPast' object has no attribute 'log_softmax'``` (Full trace…

DrewGalbraith updated 7 months ago
3
microsoft/unilm #1366

[RetNet] Equation in the paper

In [the RetNet paper](https://arxiv.org/abs/2307.08621), equation (3) is simplified by making $\gamma$ a scalar, leading to the derivation of the following: $$ o_n = \sum_{m=1}^{n} \gamma^{n-m} (Q…

kenkenpa2126 updated 11 months ago
2
microsoft/unilm #1353

RetNet applications in LLMs?

I would like to know some applications and use cases of RetNet in LLMs, as it's currently difficult to find models and code that use RetNet for pretraining from scratch on the internet. Almost none of…

zhoumengbo updated 1 year ago
1
berlino/gated_linear_attention #6

How to get St？

![image](https://github.com/berlino/gated_linear_attention/assets/139205286/e4122d60-24bf-4557-b8b9-8f31aea38ce0) I wanted to get St but I couldn't find his exact location, I returned memory_cache …

JL-er updated 8 months ago
10
berlino/gated_linear_attention #8

Tips for training from scratch?

Hello, I've been playing with this architecture on [nanoGPT](https://github.com/karpathy/nanoGPT/tree/master). While I can get other architectures to play nicely there (e.g. [RMT](https://github.co…

luchris429 updated 8 months ago
10

上一页 1...6 7 8 9 10 11 12...17 下一页

165 results for retnet

165 results
for retnet