graph-attention Search Results

1000+ results
for graph-attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

flashinfer-ai/flashinfer #541

Runtime error with single_prefill_with_kv_cache while Compil…

I tried to compile single_prefill_with_kv_cache using torch.compile. ```Python import torch from flashinfer import single_prefill_with_kv_cache data_type = torch.bfloat16 QH=64 KH=8 S=1024 …

YudiZh updated 2 weeks ago
6
NVIDIA/TensorRT #4234

State of affairs for NestedTensor (NJT) inference?

PyTorch now has some support for representing varlen sequences. It is supported to some extent by HF: - https://medium.com/pytorch/bettertransformer-out-of-the-box-performance-for-huggingface-transfor…

vadimkantorov updated 3 days ago
5
egorsmkv/optimized-whisper-intel #1

GPU inference

``` python run_gpu.py "openai/whisper-medium" "whisper-medium-onnx-int4-inc" "ukrai nian_speech.wav" You are using a model of type whisper to instantiate a model of type . This is not supported for…

egorsmkv updated 2 months ago
2
cmusphinx/g2p-seq2seq #199

fail to froze the graph

Hi, I want to froze the model to conduct unittest. When I run the command "g2p-seq2seq --model_dir model_folder_bre --freeze" There exisits the bug: AssertionError: transformer/parallel_0_5/transf…

sillytheif updated 3 years ago
1
pytorch/pytorch #116350

RuntimeError: derivative for aten::_scaled_dot_product_flash…

### 🐛 Describe the bug I found the scaled_dot_product_attention() can't implemented the backwark() . I RuntimeError: derivative for aten::_scaled_dot_product_flash_attention_backward is not impleme…

thwgithub updated 6 months ago
7
microsoft/Graphormer #113

What is the maximum number of nodes a graphormer can handle?

> the quadratic complexity of the self-attention module restricts Graphormer’s application on large graphs. The paper describes graphormer as not applicable to large graphs. What is the maximum num…

skye95git updated 1 year ago
8
Chiaraplizz/ST-TR #30

Reproducibility Issue of S-TR and T-TR on NTU-60 x-sub

I have followed your instructions on Github and used the following configuration for S-TR and T-TR respectively, but I only got 83% top-1 acc for S-TR and 58% top-1 acc for T-TR (much lower than the …

ZhouYuxuanYX updated 2 years ago
11
lemon-coder/MicroblogHAN-dataset #1

微博2022数据集不完整问题

您好，近期读到了您的论文《Microblog-HAN: A micro-blog rumor detection model based on heterogeneous graph attention network》，您在github中共享了微博2021和微博2022的数据集，但是微博2022的数据集不完整，您方便共享一下数据集吗？我的邮箱是guoboyu_00@163.com。非常谢谢您。

guoboyu00 updated 4 months ago
2
sustcsonglin/flash-linear-attention #77

[Bug]: GSA and RWKV6 Occasionally Report Gradient=NAN when B…

### Describe the bug Running training for GSA and RWKV will result in NAN gradient occasionally, rare at the beginning stage, but getting more frequent as the training processes. I checked paramete…

WorldEditors updated 3 days ago
8
Chemellia/AtomicGraphNets.jl #3

add GAT layer

I'd like to implement a graph attention mechanism a la [this paper](http://arxiv.org/abs/1710.10903).

rkurchin updated 2 years ago
6

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for graph-attention

1000+ results
for graph-attention