attention-architecture Search Results

1000+ results
for attention-architecture

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Dao-AILab/flash-attention #1121

Flash Attention 3 fp8 support 4090?

My device is a 4090 in hopper architecture, consistent with the h100 architecture. But on the homepage it says “Requirements: H100 / H800 GPU, CUDA >= 12.3.” I would like to know if flash attentio…

huanpengchu updated 2 months ago
1
huggingface/transformers #31453

How to build and evaluate a vanilla transformer?

### Model description "Attention Is All You Need" is a landmark 2017 research paper authored by eight scientists working at Google, responsible for expanding 2014 attention mechanisms proposed by Bah…

Bachstelze updated 2 weeks ago
1
Neuro-Flex/GenerativeFlex #1

Build generative ai

I would like to build a generative AI that is more advanced than the sonnet or OpenAI O1 models. I would like to use advanced mechanisms from OpenAI, Anthropic, and other sources to build the most adv…

kasinadhsarma updated 11 hours ago
1
manoa-inspire/MATP #65

Review: AuditedBalanceInput and BudgetPlInput.jsx

## Overview The focus for this code review will be centered around the AuditedBalanceInput page and BudgePlInput page. Please pay attention too: * Javascript issues * React components #…

blakewatanabe updated 1 month ago
8
LDiana22/NLP #5

Research deep learning architectures - attention layers

LDiana22 updated 5 years ago
3
jax-ml/jax #22620

Relax shape constraint in `dot_product_attention` to allow M…

Many modern architectures use either GQA or MQA rather than MHA, but `dot_product_attention` allows only MHA by enforcing `query`, `key` and `value` should have the same number of heads: https://gi…

monatis updated 2 months ago
1
pytorch/pytorch #138317

`(*bias): last dimension must be contiguous` when running co…

### 🐛 Describe the bug When using `torch.compile` on `torch.nn.functional.scaled_dot_product_attention` with length 1, a RuntimeError occurs during the backward pass: ```python import torch from…

Rick-McCoy updated 5 days ago
1
sunovivid/Perturbed-Attention-Guidance #7

Number of samples and guidance scales for reported results o…

Hey @sunovivid, great work, and congrats on the paper's acceptance at ECCV! I would like to reproduce the results, and I have the following questions related to the hyperparameters: 1. **How man…

black0017 updated 3 days ago
5
dotnet/AspNetCore.Docs #33840

Feedback on the Common web application architectures article

### Type of issue Other (describe below) ### Description _This issue has been moved from [a ticket on Developer Community](https://developercommunity.visualstudio.com/t/Feedback-on-the-Common-web…

vsfeedback updated 16 hours ago
6
vllm-project/vllm #8453

[RFC]: Support encode only models by Workflow Defined Engine

### Motivation. As vllm supports more and more models and functions, they require different attention, scheduler, executor, and input output processor. . These modules are becoming increasingly com…

noooop updated 1 week ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for attention-architecture

1000+ results
for attention-architecture