-
Compared to the same structure(the qkv attention) I implemented with TensorFlow, triton runs 10 to 20 times slower. With the help of nsight system, I found that cudaMemcpySync takes off much time whil…
-
### 🚀 The feature, motivation and pitch
Modify this line https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/ops/rms_norm.py#L306, the sum in pytorch to partial aggregation in trito…
-
```py
def test_symbolic_rw_in_array_mode():
code = {
0x1000: bytes.fromhex("FD030091"), # mov x29, sp
0x1004: bytes.fromhex("FF4300D1"), # sub sp, sp, #16
0x1008: by…
-
Running into build error, anyone else getting this?
```
[ 2%] Building CXX object src/libtriton/CMakeFiles/triton.dir/arch/arm/aarch64/aarch64Cpu.cpp.o
[ 2%] Building CXX object src/libtriton/C…
-
If you're planning to make this API somehow standardized it would be great to integrate Songlin Yang's excellent new Triton RWKV-6 implementation from FLA
https://github.com/sustcsonglin/flash-linear…
-
**Description**
Triton receives SIGSEGV during handling the traffic. Last thing that it wrote out was `E0723 11:57:36.328641 1 infer_handler.h:187] ""[INTERNAL] Attempting to access current response …
-
Thanks for the wonderful work.
When running Mamba2, I encountered the error "Triton Error [CUDA]: device kernel image is invalid".
Should you be so kind as to provide some advice?
My enviro…
-
### **Problem:**
When using model-analyzer with --triton-launch-mode=remoted, I encounter connectivity issues.
### **Context:**
I have successfully started Triton Inference Server on the same ser…
-
as #355 , I added "@torch.compile(options={"triton.cudagraphs": True}, fullgraph=True)" to "mamba_chunk_scan_combined" function in file "ssd_combined.py", and running failed with error:
```
Unsup…
-
https://github.com/triton-lang/triton/blob/95623038c75463286aa5d4a44782ba7492cc1afa/python/triton/language/semantic.py#L761C1-L763C1
how to resolve this