-
Hi, I'm following the [setup guide](https://github.com/triton-inference-server/fastertransformer_backend#setup).
I found a bug and solved it.
https://github.com/triton-inference-server/fastertra…
-
-
I am running some experiments using NVML and CUDA GeMM implementation for power consumption. I measured the following trend of power consumption for multiplication of two 16384 sized square matrices. …
-
Hi there
I am checking `TC - tensor core usage` counter for a standard resnet50 model and although I see tensor core kernels being invoked, their corresponding `TC` counter still shows `-`. Am I do…
-
**What is your question?**
I am learning to use cute to build a hgemm kernel. Tested on A10 GPU, the cute kernel is good with small problem size such as m/n/k = 4096, but I found it's much slower …
-
### System Info
- tensorrtllm_backend built using Dockerfile.trt_llm_backend
- main branch tesnorrt llm (0.13.0.dev20240813000)
- 8xH100 SXM
- Driver Version: 535.129.03
- CUDA Version: 12.5
…
-
### Describe the issue
MoE unit tests fail on older architecture.
The tests have a particular requirement. If that requirement is not met it is pointless to run the tests.
### Urgency
_No response…
-
This is the error encountered when running model.fit
AssertionError: AbstractConv2d Theano optimization failed: there is no implementation available supporting the requested options. Did you exclude b…
-
I wanted to use the integer gemm code from 0430cf0, and realized that there currently is no way of performing an operation on transposed matrices while I wanted to perform `A^t A`. In the BLAS context…
-
Hi, I'm trying to build the pybind11 extension mentioned under onemkl_gemv example DPCTL build with CUDA:
https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv
Example men…