Open junliume opened 4 weeks ago
It should be changes in this file which are causing this problem: https://github.com/ROCm/MIOpen/blob/9064d096005af004c64d76a8c4c326a6cfd01c0f/src/solver/conv_direct_naive_conv_fwd.cpp
and similar issues might be in other directions too
By removing these lines: https://github.com/ROCm/MIOpen/blob/9064d096005af004c64d76a8c4c326a6cfd01c0f/src/solver/conv_direct_naive_conv.cpp#L441-L442 It starts to work properly again.
So the KDB cache does not accept these parameters. However, why the refreshed KDB still does not accept them? @cderb
@atamazov FYI, this issue and the other GPU Target embedded in compiler issues are related. We will branch soon and need to regenerate quite a few KDBs with all fixes merged in develop branch first.
@junliume I recommend reverting #2863, if we have this possibility. Adding alpha/beta to the existing convolution primitive is a nontrivial thing and should be carefully analyzed prior implementing.
@junliume
[Analysis] Something in #2863 has caused the incompatibility
- @bghimireamd mentioned likely https://github.com/ROCm/MIOpen/blob/develop/src/kernels/gpu_reference_kernel/naive_conv.cpp#L147-L148
Of course! If a PR changes the number of input arguments, or their types, then KDB must be regenerated. And changing the number of input arguments is a substantial change (see https://github.com/ROCm/MIOpen/pull/2863/files#r1641574996).
[Note] Unfortunately, I see that some deeper redesign is required in order to make this bilinear stuff (alpha/beta) working correctly, see https://github.com/ROCm/MIOpen/pull/2863#discussion_r1641574235.
[Observations}: Reproducer:
and
However, if we delete the system KDB, this workload can run through
[Analysis] Something in #2863 has caused the incompatibility
FYI: @JehandadKhan @atamazov