ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

enable VgprForLocalReadPacking + PrefetchLocalRead=1 #1864

Closed nakajee closed 9 months ago

nakajee commented 9 months ago