-
Hey,
I am working deeply on your code.
I would like to ask you a favor, and if you could please help me to understand the cuda kernels.
My email adress is thomasc@helix.re
I have benchmark you…
-
I'm trying to build the XLA for GPU according to this guide: https://openxla.org/xla/developer_guide. Configuration goes just fine:
```
$ docker exec xla ./configure.py --backend=CUDA
INFO:root:Try…
-
I am facing error, `AWQ kernels could not be loaded. `with autoawq==0.2.4.
- image nvcr.io/nvidia/pytorch:23.10-py3
- Python 3.10
- cuda 12.2.2
- torch 2.1.0a0+32f93b1
Build and install f…
-
I have tried to quantize a model by following the guide ([PyTorch Quantization — Model Optimizer 0.15.0](https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html)), and I ca…
-
**Is your feature request related to a problem? Please describe.**
It's difficult to match CUDA kernel names in profiles with locations in the code:
**Describe the solution you'd like**
You c…
-
### 🐛 Describe the bug
This mostly follows NVIDIA's guide for conditional nodes from [here](https://developer.nvidia.com/blog/dynamic-control-flow-in-cuda-graphs-with-conditional-nodes/). It does a…
-
### 🚀 The feature, motivation and pitch
I found that some kernels use 32-bit integers as indices, which can easily lead to overflow. I think change them into int64_t (or other 64bit types) will be sa…
-
OS:windows11 22H2
C++ Compiler: MSVC2022
Python:3.8
CudatoolKit:release 11.8, V11.8.89
pytorch:2.0.1
I am try to install torch points kernels 0.6.10 via pip
then I got these
```
No CUDA ru…
-
Wonderful work! Following Q and looking forward ur reply.
1) I am curious about the method in your paper that copy the KV cache from cpu memory to gpu memory.
Since I have test the following…
-
UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation...