-
The code in the `llvm-target` branch is a fork of the OpenAI Triton code with modifications in several files. The structure of the project mirrors the structure of the AMD port. This work item objecti…
-
Hello,
I have pretrained a model with huggingface and attempted to deploy it using the TRTLLM-Triton Server method as documented [here](https://github.com/k2-fsa/sherpa/blob/master/triton/whisper/mod…
-
Hi,
When i tried the fp8 gemm code in matmul.py to cast the input "a" to be float16 but casted to fp8 just before the dot product op by setting AB_DTYPE to be tl.float8e4nv (link: https://github.com/…
-
Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected.
Since H100 FP8 o…
sryap updated
2 months ago
-
**Is your feature request related to a problem? Please describe.**
I'd like to be able to run vLLM emulating the OpenAI compatible API to use vLLM as a drop-in replacement of ChatGPT.
**Describe…
-
hit56 updated
1 month ago
-
README.md表示“目前triton仅支持Linux及WSL,暂不支持Windows及Mac OS,请等待后续更新。”
但是Windows及Mac OS可以通过[https://github.com/openai/triton](https://github.com/openai/triton)手动安装
## Install from source
```
git clone http…
-
I noticed that
> CSE and LICM don't work as expected with `exp` in the loop
is mentioned in `/python/triton/ops/flash_attention.py` (credits to Adam P. Goucher @apgoucher )
Can someone expla…
-
### Your current environment
Hello,
when the Python Wheel is installed according to your documentation:
https://docs.vllm.ai/en/latest/getting_started/installation.html#install-with-pip
The imag…
ch9hn updated
2 months ago
-
New v3 of the language model in 20231106.
https://github.com/openai/whisper/blob/main/CHANGELOG.md