ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
63 stars 89 forks source link

Conditionally populate _format9 array in scoped function #1234

Open ellosel opened 1 month ago

ellosel commented 1 month ago

The validMFMA and validSMFMA arrays are built in the global scope which means they are replicated across all processes, growing the memory consumption of TensileCreateLibrary. We moved the creation of the _format9 entry in these dictionaries to assignGlobalParameters and made populating this entry conditional. For TensileCreateLibrary by default we will not create the _format9 entry but the behavior is unchanged for Tensile.

This change reduces the peak memory consumption from ~78 GB to ~70 GB when building for gfx942 with 64 jobs.