-
Hello Team,
1. How to convert python implementation of triton code given at https://github.com/openai/triton/tree/main/python/tutorials to LLVM IR or other transformation.
2. Do we have any langua…
-
Hi! The [flash attention implementation](https://github.com/openai/triton/blob/main/python/triton/ops/flash_attention.py) is really helpful as a reference. I noticed that the code currently makes some…
-
**Before you submit an issue, please search for existing issues to avoid duplicates.**
**Issue description:**
AttributeError: 'LlamaSplitFuseInferStateInfo' object has no attribute 'logn_values'
…
-
👋 This dashboard summarizes my activity on the repository, including available improvement opportunities.
## Recommendations
_Last analysis: Jun 15 | Next scheduled analysis: Jun 22_
### Open
- h…
-
### 🐛 Describe the bug
When I calculate 2.0 ** s on cuda for very small s so that the result is in the float32 denormal range, the result from PT eager mode is the correct denormalized floating point…
vkuzo updated
2 months ago
-
When executing script `examples/offline_inference_with_prefix.py`, it will call `context_attention_fwd` from `vllm.model_executor.layers.triton_kernel.prefix_prefill`, which triggered the following er…
-
May tl.dot support mma 32x8x16 (m-n-k) which is supported by tensor core?
In the process of developing operators with Triton, it's essential to minimize the N dimension of blocks as much as possibl…
-
Hello triton team, I did a quick profiling on the triton matmul kernel https://github.com/openai/triton/blob/main/python/triton/ops/matmul.py using pytorch profiler.
![image](https://github.com/ope…
-
### Your current environment
```text
Collecting environment information...
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
warni…
-
Currently, it is always `None` which defaults to `float32`
https://github.com/openai/triton/blob/f21b36c8c54f35a88e96d7217e2c6bc9cc02ee69/python/test/unit/operators/test_matmul.py#L179
I believe…