gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Tencent/ncnn #3515

Gemm 计算两个带channel维度的Mat时，输出结果的channel维度会被忽略

示例: 输入：A:{c=12 h=64 w=26}，B:{c=12 h=26 w=64} 输出：C:{c=1 h=64 w=64} 输出结果被简化为了 A[0] * B[0]，希望支持channel维度，期望结果应该为 C:{c=12 h=64 w=64}

qiqikit updated 2 years ago
1
linnanwang/BLASX #4

nvprof profile shows excessive waiting and lack of multi-GPU…

Running the testing/gemm.c with only sgemm (commenting out dgemm code) and larger matrices: int loop = 0; for (loop = 1; loop < 2; loop++) { int M = 10000; int N = M; …

pseudotensor updated 7 years ago
8
hacksider/Deep-Live-Cam #174

RuntimeError with CUDA and cuDNN

``` RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:743 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasn't able to be loaded. Please ins…

cooper1x updated 2 weeks ago
8
Xilinx/gemx #5

Measure the time of each step

Hi, I want to use the gemx program to compute matrix multiplication in FPGA . Here I want to know how to measure the execution time of these steps included: 1)read data from DDR to FPGA 2)compute …

ZhaoBaofu updated 6 years ago
3
dataPulverizer/dblas #1

Porting l2 and part of l3 to mir-glas

Hello DP, Good work! Currently mi-glas has L1 and half of L3. It would be awesome when we unify DBLAS and GLAS. The advantages of such integration: 1. Ready to use full featured BLAS ! 2. BLA…

9il updated 7 years ago
2
NVIDIA/CUDALibrarySamples #203

cuSPARSELt matmul example not working on M=N=K8192

on https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul the example runs fine on the existing small m,n,k, but unfortunately when i change my m,n,k to be 8192, i get a runti…

OrenLeung updated 1 month ago
12
BVLC/caffe #4328

Sparse convolutional neural networks

Anyone has interest to utilize the sparsity to accelerate DNNs? I am working on the fork https://github.com/wenwei202/caffe/tree/scnn and currently, on average, achieve ~5x CPU and ~3x GPU layer-wi…

wenwei202 updated 5 years ago
194
vllm-project/vllm #2339

awq compression of llama 2 70b chat got bad result

I use awq to quantize llama 2 70b-chat by: ``` CUDA_VISIBLE_DEVICES="1,2,3,4,5,6,7" python quantize_llama.py ``` the codes of quantize_llama.py： ``` from awq import AutoAWQForCausalLM from tr…

fancyerii updated 1 month ago
3
NVIDIA/apex #530

Tensor core usage profiling - Turing architecture

Hi there I am checking `TC - tensor core usage` counter for a standard resnet50 model and although I see tensor core kernels being invoked, their corresponding `TC` counter still shows `-`. Am I do…

SrivastavaKshitij updated 3 years ago
11
k2-fsa/sherpa #611

Looking for complete conversion from pretrained huggingface …

Hello, I have pretrained a model with huggingface and attempted to deploy it using the TRTLLM-Triton Server method as documented [here](https://github.com/k2-fsa/sherpa/blob/master/triton/whisper/mod…

lionsheep24 updated 1 month ago
7

上一页 1...93 94 95 96 97 98 99...100 下一页

1000+ results for gemm

1000+ results
for gemm