-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
No
### Source
source
### TensorFlow version
tensorflow-lite:2.16.1
### Custom code
No
### OS platform and distribu…
-
Hello, we have measured the FP8 GEMM performance using Triton on NVIDIA H100 (500 W, 1980 MHz). We would like to request your help in understanding if the performance is expected.
Since H100 FP8 o…
sryap updated
4 months ago
-
New Epic item to track Implicit GEMM work. The tasks here are generally listed so that later tasks in a block depend on previous ones.
```[tasklist]
## Common Tasks
- [ ] #13541
- [ ] #13627
- …
-
I'm facing a problem about nccl kernel overlaping with a cutlass gemm kernel.
I used a cutlass gemm kernel with a grid size of and my GPU has 142 SMs, so apparently there is a surplus of SMs. Then I…
-
**What is your question?**
Internal CUTLASS error is observed, when I try increasing the warp count for kernel "cutlass_simt_hgemm_256x128_8x2_nt_align1" to values other than default 4x2x1 (by changi…
-
Lead to Suboptimal Shared Memory Reuse.
pr #9341 introduced liveness analysis to merge the shared memory allocations , places touched buffer records at the outermost scope (e.g., outer loops) rathe…
-
### Feature description
Introduction of ONNX Gemm operation conversion
https://onnx.ai/onnx/operators/onnx__Gemm.html
### Feature motivation
Useful for optimisations
Currently get:
``…
-
### 软件环境
```Markdown
paddle2onnx 1.2.3
paddlefsl 1.1.0
paddlenlp 3.0.0b1
paddleocr 2.8.1
paddlepaddle 2.6.2
paddlepaddl…
-
**Output of 'strings libarm_compute.so | grep arm_compute_version':**
arm_compute_version=v23.11 Build options: {'Werror': '0', 'debug': '0', 'neon': '1', 'opencl': '0', 'embed_kernels': '0', 'os…
-
CPU Info
$ lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): …