Closed junliume closed 1 year ago
Proposal is something similar to the following?
const auto k = ProblemInterpreter::GetOutputChannelK(problem);
if(k % GetEPackLength(ctx, problem, false) != 0)
return false;
@junliume @zjing14 @JehandadKhan A couple of ideas:
-DMIOPEN_USE_COMGR=Off -DMIOPEN_USE_HIPRTC=Off
)Unfortunately, I don't have MI200/MI100 on hand and thus unable to check these hypotheses myself.
@junliume
should we explicitly exam ConvHipImplicitGemmForwardV4R4Xdlops applicability as regard to what format of k is required?
Of course, the bugs in the solver are also possible, and this is in fact the first hypothesis that comes to mind. The best assignee for this work is an engineer who is fully aware of the kernel design. But the problem is that this is a difficult and time-consuming work.
/cc @zjing14 @asroy @JehandadKhan
@atamazov some corrections: the numerical verifications work for k%32 ==0
for observations, and the root cause might be related to the basic tile sizes and shapes used in CK utilizing xdlops. @zjing14 is proposing a patch soon.
@junliume Thanks, I see this K % 32 == 0
thing in the topmost comment. If the solver developers confirm that IsApplicable()
must be fixed for FP16 (and possibly for BF16), then we are fine. If not, then I would recommend experiments listed at https://github.com/ROCmSoftwarePlatform/MIOpen/issues/2284#issuecomment-1659214998 (these are not expected to be time-consuming).
@zjing14 🚀 Thanks for #2297! Do you have time to continue investigations? Or it would be better to assign some other engineer?
@junliume @JehandadKhan I recommend lowering urgency (maybe to https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal).
[Problem Observations] On gfx90a nodes:
Something they share in common is the strange output channel number:
-k 336
.[Experiments]
k=128
k=64
k=32
k=16
Unless
k
is very small (i.e. 16 in the above case), usually the solver will pass fork == 2^n
format.@zjing14 @asroy @atamazov @JehandadKhan : should we explicitly exam
ConvHipImplicitGemmForwardV4R4Xdlops
applicability as regard to what format of k is required?CC: @averinevg @DrizztDoUrden