-
Implement attention algorithms:
- [ ] Flash attention 2
- [ ] Flash attention 3
- [ ] Other attention algorithms
-
`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[](https://localhost:8080/#) in ()
19
…
-
```
{'mid_block_add_attention', 'use_quant_conv', 'scaling_factor', 'force_upcast', 'shift_factor', 'latents_std', 'use_post_quant_conv', 'latents_mean'} was not found in config. Values will be initi…
-
Does your codes have function to analyze attention scores, or needs to be observed in the **Transformer** class
-
Inspired from [this paper](https://arxiv.org/abs/2405.14862), we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in hu…
-
Hi, I want to use FlexAttention for alibi with padding(no bias)
If seq_len is 5 I want to make alibi tensor like below, which is alibi tensor with seq_len, and last item is not penalized
```
0 …
-
# Summary
This can have large performance impact in real Attention modules.
The most common pattern (derived from nano-gpt)
```Python
import torch
import torch.nn as nn
import torch.nn.funct…
-
-
### Request description
The scale parameter was added to the AttentionOp/OnlineAttentionOp as a stopgap solution to make models work. Now that we are in a better place to support attention, it's time…
-
I have a previous code using old trl package:
```
@dataclass
class mDPODataCollatorBunny(DPODataCollatorWithPadding):
def __init__(self, tokenizer, **kwargs):
super().__init__(*…