hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
The validMFMA and validSMFMA arrays are built in the global scope which means they are replicated across all processes, growing the memory consumption of TensileCreateLibrary. We moved the creation of the _format9 entry in these dictionaries to assignGlobalParameters and made populating this entry conditional. For TensileCreateLibrary by default we will not create the _format9 entry but the behavior is unchanged for Tensile.
This change reduces the peak memory consumption from ~78 GB to ~70 GB when building for gfx942 with 64 jobs.
The validMFMA and validSMFMA arrays are built in the global scope which means they are replicated across all processes, growing the memory consumption of TensileCreateLibrary. We moved the creation of the
_format9
entry in these dictionaries to assignGlobalParameters and made populating this entry conditional. ForTensileCreateLibrary
by default we will not create the_format9
entry but the behavior is unchanged forTensile
.This change reduces the peak memory consumption from ~78 GB to ~70 GB when building for gfx942 with 64 jobs.