-
I want to test the execution of multiple concurrent tasks on the GPU in Dask cuML. I'm using K-means (code below) and I'm changing the chunk size so that I can create multiple tasks for the fit method…
-
internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1505159356788937/
The motivation is that people commonly pack data ptrs into a tensor and then pass the tensor to a triton …
-
### 🐛 Describe the bug
I have a function that calls some custom cuda kernels interleaved with pytorch operations. When I try to capture the function with a cuda graph, the cuda kernels become no-ops.…
-
Thanks for the great repo!
We are testing the correctness of `flash_attn_varlen_func` when enabling the cuda graph.
This is the test we use.
https://github.com/vllm-project/vllm/blob/66e832be41cd3f…
-
I am looking for some pointers to get started with leveraging Triton to generate kernels for a custom hardware backend.
I see there have been efforts made that support lowering PyTorch to Triton Ke…
-
Hi, trying to learn spektral here. I can't seem to use ARMAConv:
def create_model(n_nodes, n_node_features, model_layers):
node_features_input = Input(shape=(n_node_features,), dtype=tf.float3…
-
Hi, I am currently working on profiling [VLLM](https://github.com/ROCm/vllm/tree/main) and I observed that the tool captures the execution of graph kernels at a high level but does not provide detail…
-
### 🐛 Describe the bug
I gathered all the 10K triton kernels generated by inductor using stack of PR ( https://github.com/pytorch/pytorch/pull/120048 ). After deduping same kernels used by different …
-
```python
import ComputationalHypergraphDiscovery as CHD
import pandas as pd
df=pd.read_csv('https://raw.githubusercontent.com/TheoBourdais/ComputationalHypergraphDiscovery/main/examples/SachsData.…
-
Laptop: ThinkBook 16 G6+ IMH (Intel-based)
OS: Ubuntu 24.04 LTS/Gnome
With some workarounds was able to build this kernel module for 6.8.x kernel(s). And it works pretty good.
But with newer ke…