attention Search Results

1000+ results
for attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

tracel-ai/cubecl #21

Package: Attention

Implement attention algorithms: - [ ] Flash attention 2 - [ ] Flash attention 3 - [ ] Other attention algorithms

louisfd updated 3 months ago
2
md-mohaiminul/VideoRecap #14

GPT2LMHeadModel does not support an attention implementation…

`--------------------------------------------------------------------------- ValueError Traceback (most recent call last) [](https://localhost:8080/#) in () 19 …

hanjidani updated 2 days ago
4
fudan-generative-vision/hallo2 #40

Training initialization issue

``` {'mid_block_add_attention', 'use_quant_conv', 'scaling_factor', 'force_upcast', 'shift_factor', 'latents_std', 'use_post_quant_conv', 'latents_mean'} was not found in config. Values will be initi…

buggyyang updated 3 days ago
3
AnswerDotAI/cold-compress #44

How to get attention scores

Does your codes have function to analyze attention scores, or needs to be observed in the **Transformer** class

wiluen updated 1 month ago
1
vllm-project/vllm #5228

[Feature]: Custom attention masks

Inspired from [this paper](https://arxiv.org/abs/2405.14862), we're exploring ways to bootstrap a bidirectional-context LLM from a decoder-only Causal LLM (e.g. llama-3). This is very easy to do in hu…

ojus1 updated 1 week ago
1
pytorch-labs/attention-gym #74

How to implement Bidirectional Alibi with padding using flex…

Hi, I want to use FlexAttention for alibi with padding(no bias) If seq_len is 5 I want to make alibi tensor like below, which is alibi tensor with seq_len, and last item is not penalized ``` 0 …

sphmel updated 8 hours ago
1
pytorch/pytorch #138340

[Performance] [CuDNN-Attention] CuDNN backend should return …

# Summary This can have large performance impact in real Attention modules. The most common pattern (derived from nano-gpt) ```Python import torch import torch.nn as nn import torch.nn.funct…

drisspg updated 1 week ago
21
pytorch/torchtune #1848

Removing cacheing related APIs from attention layers

SalmanMohammadi updated 2 weeks ago
1
iree-org/iree #18932

[Attention] Remove scale as an input to AttentionOp/OnlineAt…

### Request description The scale parameter was added to the AttentionOp/OnlineAttentionOp as a stopgap solution to make models work. Now that we are in a better place to support attention, it's time…

Groverkss updated 1 week ago
2
huggingface/trl #2296

Code migration suggestions

I have a previous code using old trl package: ``` @dataclass class mDPODataCollatorBunny(DPODataCollatorWithPadding): def __init__(self, tokenizer, **kwargs): super().__init__(*…

MonolithFoundation updated 2 days ago
1

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for attention

1000+ results
for attention