-
I tried to compile single_prefill_with_kv_cache using torch.compile.
```Python
import torch
from flashinfer import single_prefill_with_kv_cache
data_type = torch.bfloat16
QH=64
KH=8
S=1024
…
-
PyTorch now has some support for representing varlen sequences. It is supported to some extent by HF:
- https://medium.com/pytorch/bettertransformer-out-of-the-box-performance-for-huggingface-transfor…
-
```
python run_gpu.py "openai/whisper-medium" "whisper-medium-onnx-int4-inc" "ukrai
nian_speech.wav"
You are using a model of type whisper to instantiate a model of type . This is not supported for…
-
Hi, I want to froze the model to conduct unittest. When I run the command
"g2p-seq2seq --model_dir model_folder_bre --freeze"
There exisits the bug:
AssertionError: transformer/parallel_0_5/transf…
-
### 🐛 Describe the bug
I found the scaled_dot_product_attention() can't implemented the backwark() . I
RuntimeError: derivative for aten::_scaled_dot_product_flash_attention_backward is not impleme…
-
> the quadratic complexity of the self-attention module restricts Graphormer’s application on large graphs.
The paper describes graphormer as not applicable to large graphs. What is the maximum num…
-
I have followed your instructions on Github and used the following configuration for S-TR and T-TR
respectively, but I only got 83% top-1 acc for S-TR and 58% top-1 acc for T-TR (much lower than the …
-
您好,近期读到了您的论文《Microblog-HAN: A micro-blog rumor detection model based on heterogeneous graph attention network》,您在github中共享了微博2021和微博2022的数据集,但是微博2022的数据集不完整,您方便共享一下数据集吗?我的邮箱是guoboyu_00@163.com。非常谢谢您。
-
### Describe the bug
Running training for GSA and RWKV will result in NAN gradient occasionally, rare at the beginning stage, but getting more frequent as the training processes.
I checked paramete…
-
I'd like to implement a graph attention mechanism a la [this paper](http://arxiv.org/abs/1710.10903).