ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
208 stars 142 forks source link
amd assembly auto-tuning blas dnn gemm gpu gpu-acceleration gpu-computing hip machine-learning matrix-multiplication neural-networks opencl python radeon tensor-contraction tensors

Tensile is a tool for creating benchmark-driven backend libraries for GEMMs, GEMM-like problems (such as batched GEMM), and general N-dimensional tensor contractions on a GPU. The Tensile library is mainly used as backend library to rocBLAS. Tensile acts as the performance backbone for a wide variety of 'compute' applications running on AMD GPUs.

See Tensile Wiki for documentation.