-
Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals d…
-
Hello @AdnanHoque , I am trying to recreate the results from the blog [Accelerating Llama3 FP8 Inference with Triton Kernels](https://pytorch.org/blog/accelerating-llama3/). I haven't been able to get…
mgoin updated
2 months ago
-
### Your current environment
When running gemma2 7b, an error is reported [rank0]: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &fal…
-
Hi,
We want to implement strided GEMM implementation where the dot product for each output element is computed using only even indexed elements from rows/columns of matrix A/B. Seems like the routi…
-
https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2eae26ce8c4958101/include/cutlass/gemm/warp/default_mma_tensor_op.h#L121
https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2e…
-
Hi, I am trying to find the best set of 16 tuning parameters for a particular GEMM task: m=32,n=256,k=32, on an Intel KNL machine.
I have tuned GEMM for this particular task using a single process …
-
when we use fp8 data type , we found ffn gemm/atten prj support real fp8 comute(this is supported on H20、L20), but Q*transopse(Key) or softmax * value in attention dosen't support fp8 compute, …
-
New Epic item to track Implicit GEMM work. The tasks here are generally listed so that later tasks in a block depend on previous ones.
```[tasklist]
## Common Tasks
- [ ] #13541
- [ ] #13627
- …
-
I encountered the following error while using the quantized Qwen-72B
```
out = awq_ext.gemm_forward_cuda(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA ke…
-
**Describe the bug**
When compiling the sample code for `examples/16_ampere_tensorop_conv2dfprop/ampere_tensorop_conv2dfprop.cu`, it fails with the following error message. Any other example for conv…