gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2020

[Lookahead] UNAVAILABLE: Internal: unexpected error when cre…

### System Info PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang v…

cwlseu updated 4 days ago
1
pytorch/pytorch #56703

[Pytorch Mobile] Error running build_pytorch_android.sh

## 🐛 Bug I'm following [Pytorch Vulkan backend user workflow](https://pytorch.org/tutorials/prototype/vulkan_workflow.html#android-java-api) in order to build a libtorch binary that includes Vulkan…

ghost updated 2 years ago
4
jetpacapp/DeepBeliefSDK #12

AndroidExample crashes on Android 4.2 4.3

I found the crash dump in the log. It seem crash occurs when it frees the matrix in the function cblas_sgemm_fixed. I set libc.debug.malloc to 10 and it reported rear guard mismatch for 20bytes. > …

ziggyJ updated 10 years ago
6
clMathLibraries/clBLAS #348

how about the performance on adreno gpu

Hi, I wanna ask have any benchmark about Adreno gpu? Or does clblas tunned on Adreno GPU? I only found an issue about tuning but no benchmark information found. Thanks in advance.

ysh329 updated 4 years ago
1
pytorch/ao #64

[New Feature] CUTLASS kernels for w4a8 quantization

We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47) For this to run efficiently on the GPU we'd need kernel support for W4A8…

supriyar updated 4 months ago
4
pytorch/pytorch #69506

[performance] a profiler util to show a rough break-down of …

## 🚀 Feature At https://pytorch.slack.com/archives/C3PDTEV8E/p1638511540268500 we were discussing how depending on the model type the different bf16/amp or tf32 modes may or may not do much speed i…

stas00 updated 2 years ago
1
NVIDIA/TensorRT-LLM #743

always be killed when build TensorRT engine

I try to run llama-7b with TensorRT-LLM, when build TensorRT engine as follows: python3 build.py --model_dir /opt/llms/llama-7b --dtype float16 …

Burning-XX updated 8 months ago
11
flame/blis #629

Excuse me, is the performance evaluation of small/skinny mat…

if ( bli_does_notrans( transa ) ) bli_obj_create( dt, m, k, rs_a, cs_a, &a ); else bli_obj_create( dt, k, m, cs_a, rs_a, &a ); if ( bli_does_notrans( transb ) ) bli_obj_cre…

ProgrammerWLY updated 2 years ago
1
NVIDIA/TensorRT-LLM #1580

Fail to build int4_awq on Mixtral 8x7b

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.…

gloritygithub11 updated 2 weeks ago
15
pytorch/pytorch #68105

Some type combinations of cublas gemm are not supported when…

## 🐛 Bug `torch.mm/addmm` are calling `cublasGemmEx` under the hood. However, they are type combinations that are claimed to be non-supported by pytorch when they should work fine: Example: …

GuillaumeLeclerc updated 2 years ago
2

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for gemm

1000+ results
for gemm