gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

apache/tvm #16986

[Bug] TVM 0.13.0 version does not work with Python 3.8 - Err…

Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals d…

rishabhjainps updated 1 month ago
5
pytorch-labs/applied-ai #21

Triton FP8 GEMM does not seem to work

Hello @AdnanHoque , I am trying to recreate the results from the blog [Accelerating Llama3 FP8 Inference with Triton Kernels](https://pytorch.org/blog/accelerating-llama3/). I haven't been able to get…

mgoin updated 2 months ago
9
vllm-project/vllm #6166

[Bug]: When running gemma2 7b, an error is reported [rank0]:…

### Your current environment When running gemma2 7b, an error is reported [rank0]: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &fal…

orderer0001 updated 1 week ago
13
OpenMathLib/OpenBLAS #2289

Modifying GEMM for strided GEMM computation

Hi, We want to implement strided GEMM implementation where the dot product for each output element is computed using only even indexed elements from rows/columns of matrix A/B. Seems like the routi…

nitish2112 updated 4 years ago
11
NVIDIA/cutlass #1514

[QST] two files are included in each other

https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2eae26ce8c4958101/include/cutlass/gemm/warp/default_mma_tensor_op.h#L121 https://github.com/NVIDIA/cutlass/blob/5c447dd84f8ae0e1d48ff9a2e…

wzhcz8902 updated 13 hours ago
1
CNugteren/CLBlast #428

Inconsistent GEMM Tuner Execution Times

Hi, I am trying to find the best set of 16 tuning parameters for a particular GEMM task: m=32,n=256,k=32, on an Intel KNL machine. I have tuned GEMM for this particular task using a single process …

huttered40 updated 5 months ago
11
NVIDIA/TensorRT-LLM #1921

attention fp8 compute type

when we use fp8 data type , we found ffn gemm/atten prj support real fp8 comute(this is supported on H20、L20), but Q*transopse(Key) or softmax * value in attention dosen't support fp8 compute, …

enozhu updated 2 weeks ago
4
iree-org/iree #13641

Implicit GEMM support

New Epic item to track Implicit GEMM work. The tasks here are generally listed so that later tasks in a block depend on previous ones. ```[tasklist] ## Common Tasks - [ ] #13541 - [ ] #13627 - …

allieculp updated 11 months ago
4
casper-hansen/AutoAWQ #557

CUDA error: no kernel image is available for execution on th…

I encountered the following error while using the quantized Qwen-72B ``` out = awq_ext.gemm_forward_cuda( RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA ke…

AragornHorse updated 2 days ago
1
NVIDIA/cutlass #1591

[BUG] Convolution examples fail to compile

**Describe the bug** When compiling the sample code for `examples/16_ampere_tensorop_conv2dfprop/ampere_tensorop_conv2dfprop.cu`, it fails with the following error message. Any other example for conv…

ogoidmatos updated 1 week ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for gemm

1000+ results
for gemm