-
Hi. Is there a way to use rules_cuda to generate PTX? For example, this is useful for OptiX programming.
-
Add a lecture based on the [numba_cuda notebook](https://colab.research.google.com/github/cbernet/maldives/blob/master/numba/numba_cuda.ipynb#scrollTo=5U0yngpWU1Sg) written by @jstac with an introduct…
-
I cannot reproduce the code with the 3090 graphics card. Suspected is the reason for cuda programming.
-
There are a number of HIP functions that assume a selected device for the current thread and they operate on this device. For example `hipModuleLoadDataEx`.
We need to set the correct HIP device duri…
-
When discussing the Thrust JIT support with the CCCL team, a question was raised regarding the usage of the `jit.thrust.device` policy in the test suite, ex:
https://github.com/cupy/cupy/blob/be5d7f…
-
I think I'm able to say that stream priority isn't implemented in pycuda right now.
Although Cuda gives the possibility to allocate priorities to Streams, as pointed out here :
https://docs.nvidi…
-
We should consider whether it is possible and desired to automatically combine kernels into CUDA graphs to reduce overhead of calling individual kernels.
Here is the relevant documentation:
- http…
-
- see https://github.com/ObrienlabsDev/machine-learning/issues/10
## Use Cases
Tensor cores have 3.5x the performance on NVidia GPUs than cuda cores
### LLM and Generative AI
- https://github.…
-
I am trying to make working with GPU Tinyllama with:
```bash
./TinyLlama-1.1B-Chat-v1.0.F32.llamafile -ngl 9999
```
But it seem not possible to allocate 66.50 MB of memory on my card, even if I j…
-
I'm not familiar with CUDA programming. Could you explain a little bit about the key factors in this implementation that brings performance gain? Thanks a lot!