-
```
(TinyChatEngine) zhef@zhef:~/TinyChatEngine/llm$ make chat -j
CUDA is available!
src/Generate.cc src/LLaMATokenizer.cc src/OPTGenerate.cc src/OPTTokenizer.cc src/utils.cc src/nn_modules/Fp32OPT…
-
Hi, I have a question regarding custom CUDA kernels and synchronization. I tried to proceed as described in [Interoperability with CUDA](https://arrayfire.org/docs/interop_cuda.htm#gsc.tab=0) which st…
-
Instead of Cuda Kernels, would it be possibe to use ND4J library instead or would that support be added.
-
### Describe the feature request
Request:
Leverage `onnxruntime-web` kernels to create a native WebGPU Execution Provider for **non-web** environments.
Story:
I am in a unique situation where my…
-
Int8 matrix multiplication kernels are currently called on CUDA and CPU devices when activations and weights are quantized to int8. However, FP8 matmuls are not used when activations and weights are q…
-
I add the 'cl.exe' to PATH, but still says can not found.
```
error: failed to run custom build command for `candle-kernels v0.5.1 (https://github.com/huggingface/candle.git#cd4d941e)`
Caused b…
-
Hi there,
I have copied s4.py and the kernel extension into another repository I am working on. I had S4 components running (with CUDA), and then I installed the kernel extensions. The build output …
-
### Describe the feature
https://github.com/linkedin/Liger-Kernel
Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU train…
-
I'm testing unsloth rope and here is my script:
```python
import torch
from unsloth.kernels.rope_embedding import fast_rope_embedding
from unsloth.models.llama import LlamaRotaryEmbedding as Uns…
-
Will this tool only support hipify of host fortran calls or it will be able to compile Cuda Fortran kernels?
For example:
```
attributes(global) subroutine saxpy(x, y, a)
implicit none
…