gemm Search Results - Githubissues

1000+ results
for gemm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

numpy/numpy #11334

Extend tensordot to more than two arguments

Does it make sense to extend tensordot to support more than two input arrays? The definition and API seem to be amenable to this extension, though I can't say anything about the implementation.

mrocklin updated 4 years ago
7
ROCm/AMDMIGraphX #1832

[MLIR] add pow elementwise fusion support

This is supported in rocMLIR side. This is basically to add pow to the allowed list of pointwise ops in fuse_mlir pass with a verify test to MIGraphX. DoD: * A verify test with pow operator trail…

manupak updated 1 year ago
1
libxsmm/libxsmm #685

LIBXSMM v2 - Open Items

Rename at least functions which are [exported](https://github.com/libxsmm/libxsmm/blob/main/.abi.txt) and not adhering to best practices and typical API conventions. For example: * Replace term "cr…

hfp updated 1 year ago
11
microsoft/onnxruntime #21476

quant_pre_process failed on NonMaxSuppression

### Describe the issue We are trying to quantize our proprietary model based on RetinaNet using TensorRT's model optimization library. The following warning was raised: **"Please consider running pre…

korkland updated 1 month ago
2
CEMeNT-PSAAP/MCDC #167

Outside GPU libraries for GPU functions

As we start to integrate more advanced hybrid methods on the GPU we are finding that [most numpy functions]() are not supported on the GPU. I think we have two options here (1) reimplement all operati…

jpmorgan98 updated 7 months ago
1
numpy/numpy #18669

np.matmul with `out` parameter slower than np.dot in some ca…

For 2D inputs, `np.matmul` and `np.dot` are semantically the same, but I've found that in some cases `matmul` can be much slower even though the documentation for `np.dot` says `matmul` is preferred f…

bmerry updated 3 years ago
11
alibaba/rtp-llm #120

2张16G的T4卡都跑不起来examples/test.py

硬件环境： [root@iZ6we55nj5ujtoxm12k2wwZ ~]# nvidia-smi Fri Sep 27 15:14:37 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.06 …

zhangtaibo updated 1 week ago
1
NVIDIA/TensorRT-LLM #722

Mixtral generation doesn't stop

Hello, We are using latest main TensorRT LLM and container build with TensorRT-Backend to run Mixtral. Generation doesn't stop and goes until max_tokens is reached. Passing "end_id": 2 doesn't help. …

yessenzhar updated 6 months ago
6
ROCm/TransformerEngine #78

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dum…

### Problem Description On Llama3 70B Proxy Model, the training stalls & gpucore dumps. The gpucore dumps are 41GByte per GPU thus i am unable to send it. Probably easier for yall to reprod this er…

OrenLeung updated 22 hours ago
10
SqueezeBits/QUICK #2

marlin 커널 과의 속도 비교

안녕하세요. 굉장히 놀라운 작업에 감사드립니다. 다름이 아니라 marlin 커널 과의 속도 벤치마크 가 궁금해져서 질문드려봅니다. [marlin 커널](https://github.com/IST-DASLab/marlin)은 4bit cuda kernel중 하나이며 매우 최적화 되어 있다고 주장합니다. 혹시 이 커널과 벤치마크하여 비교해주실수 …

qwopqwop200 updated 8 months ago
1

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for gemm

1000+ results
for gemm