ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.02k stars 210 forks source link

Do we need many members in PerformanceConfigAsmImplicitGemmGTC? #1268

Open atamazov opened 2 years ago

atamazov commented 2 years ago

Originated from https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1230#discussion_r737878197 (see the whole thread). Synopsis:

As far as I see, for PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC we only need to store table index and gemm_k_global_split. The rest of data can be read from the table right in GetSolution(). Please look into SetNextValue() and you'll see that only index and gemm_k_global_split are modified.

atamazov commented 2 years ago

Let's discuss.

shaojiewang commented 2 years ago

Or you can use type other than PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC for the vector returned by GetFwdXdlopsNHWCConfigList. For example you can use aggregate like this (pseudo code):

struct GeneratedData
{
Datatype datatype;
Layout layout;
Direction direction;
PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC perfConfig;
}
...
static const inline std::vector<GeneratedData>&
GetFwdXdlopsNHWCConfigList()
{
static const  std::vector<GeneratedData> kernel_param_list {
{"fwd", "nhwc", miopenFloat,  { ... } }, // a pair of curly braces inserted
...
}

I agree with that. I can make a PR by this way.

ppanchad-amd commented 3 months ago

@carlushuang @shaojiewang Is this fixed with latest ROCm 6.0.2 (HIP 6.0.32831)? If resolved, please close ticket. Thanks!