-
```
FAILED language/test_core.py::test_reduce1d[4-1-1-sum-int8-32] - triton.runtime.errors.InterpreterError: InterpreterError("OverflowError('Python integer 351 out of bounds for int8')")
FAILED lan…
-
### System Info
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang v…
-
Within PyTorch torchinductor we are JIT compiling many triton functions, often 100+. We currently have a mechanism that will initialize a [pool of forked processes](https://github.com/pytorch/pytorch/…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
_No …
-
[Flash attention 3](https://tridao.me/blog/2024/flash3/) makes use of new features of the Hopper architecture.
- (async) WGMMA
- TMA
- overlap softmax
Are these all things that can currently (…
-
### Describe the issue
Hello developers:
I followed guide `docs/ORT_Use_Triton_Kernel.md` and wanted to use triton kernel in ONNX runtime. But I encountered a error.
**test script:**
```bash
…
-
### Comment:
So, the upstream github repo still does not do proper tagging of releases (see [this issue](https://github.com/triton-lang/triton/issues/3535)). That said, looks like we do have a [relea…
-
Description of problem:
I did some experiments to measure timing performance to compare standalone inference based on a TensorRT model vs Triton serving the TensorRT model using identical input on a …
-
hi,
where can i find documentation how to build triton inference server trt-llm 24.06 for sagemaker myself so i can run it on sagemaker?
Nvidia Image i want to use: nvcr.io/nvidia/tritonserver:2…
-
**Description**
Triton build using `./build.py ` fails due to a warning (`-Werror=sign-compare`) which throws an error. The warning comes from `response_cache_test.cc` in the `core` repo ([here](http…