-
**Description**
We found that the performance of triton+tensorrt under stable QPS and uneven QPS is very different. As follows:
- uneven QPS
(1) QPS
![image](https://github.com/triton-inference-se…
-
Hi,
I found that the unpad_input function makes the cuda graph capture fail if we have key_attention_mask.
https://github.com/HazyResearch/flash-attention/blob/72ad03eaa661f6bf3a14c855316c27fbab4f…
-
Hi, I want to froze the model to conduct unittest. When I run the command
"g2p-seq2seq --model_dir model_folder_bre --freeze"
There exisits the bug:
AssertionError: transformer/parallel_0_5/transf…
-
Hello, sorry to bother you. I noticed that your code doesn't include a section for extracting typeset data from the dataset. Could you please provide that part of the code?
-
I'd like to implement a graph attention mechanism a la [this paper](http://arxiv.org/abs/1710.10903).
-
**Is your feature request related to a problem?
Since Guardrails policies may include both audit and deny actions, it is important to surface both audit and denials for the cluster admin to understa…
-
> the quadratic complexity of the self-attention module restricts Graphormer’s application on large graphs.
The paper describes graphormer as not applicable to large graphs. What is the maximum num…
-
Hey there,
I just wanted to draw attention to an issue I discovered. Using the latest version, I tried to display a graph inside of a WKWebView inside an iOS app I'm developing. On one of my testin…
-
## Describe the bug
1>C:\Program Files (x86)\Microsoft SDKs\UWPNuGetPackages\microsoft.net.native.compiler\2.2.12-rel-31116-00\tools\Microsoft.NetNative.targets(809,5): warning : MCG : wa…
-
Framework not specified. Using pt to export the model.
====Exporting IR=====
Loading checkpoint shards: 0%| | 0/7 [00:00