Closed junliume closed 2 months ago
Hi @atamazov if we cannot identify the real cause in #2969 soon, we may need to revert it asap. Thanks!
Hello @junliume. Thanks for triaging the issue. Most likely you are right.
So the real cause is that gemm solvers have issues. #2969 uses real gemm solutions for benchmarking instead of "simulations" and now the issues manifest themselves more often. I suspect that if we revert it, then the issue will be just better camouflaged ;/
Most likely the proper fix is narrowing the applicability of the faulty gemm solvers. Please share the list of vgg16 configs with me if you have it on hand.
Meanwhile I recommend trying export MIOPEN_WORKAROUND_ISSUE_2789=0
. It was introduced in #2811 which extends applicability of gemm solvers. If it helps, then we can begin with changing the default of the variable to false.
@atamazov here is the list of unique configs:
./bin/MIOpenDriver conv -n 1024 -c 3 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 64 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 14 -W 14 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 14 -W 14 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 14 -W 14 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 512 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 28 -W 28 -k 512 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 256 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 56 -W 56 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 128 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 64 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 64 -H 112 -W 112 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 4 -t 1 -V 0
./bin/MIOpenDriver conv -n 1024 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
@junliume Thanks! Can you let me know if export MIOPEN_WORKAROUND_ISSUE_2789=0
helps, if possible to try it.
@junliume Thanks! Can you let me know if
export MIOPEN_WORKAROUND_ISSUE_2789=0
helps, if possible to try it.
When I do
MIOPEN_WORKAROUND_ISSUE_2789=0 MIOPEN_DEBUG_DISABLE_FIND_DB=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=GemmBwdRest ./bin/MIOpenDriver conv -n 2048 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
I can reliably hit Memory access fault by GPU node-2 (Agent handle: 0x2c9cb70) on address 0x7ec59a607000. Reason: Unknown.
In this case work space size is 115605504
(which is less then 7287183769
) so export MIOPEN_WORKAROUND_ISSUE_2789=0
would not have any effect no?
@junliume Thanks! Can you let me know if
export MIOPEN_WORKAROUND_ISSUE_2789=0
helps, if possible to try it.
I tried the env var but it is not effective, we still see the mem fault errors :(
@bghimireamd @junliume Thanks!
MIOPEN_WORKAROUND_ISSUE_2789=0 MIOPEN_DEBUG_DISABLE_FIND_DB=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=GemmBwdRest ./bin/MIOpenDriver conv -n 2048 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
It's likely caused by here: https://github.com/ROCm/MIOpen/blob/2de28ab6870fede26536fd9fa0fde51d2d5542e5/src/kernels/MIOpenCol2Im2d.cl#L69
MIOPEN_WORKAROUND_ISSUE_2789=0 MIOPEN_DEBUG_DISABLE_FIND_DB=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=GemmBwdRest ./bin/MIOpenDriver conv -n 2048 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
It's likely caused by here:
when I comment all the code except for https://github.com/ROCm/MIOpen/blob/2de28ab6870fede26536fd9fa0fde51d2d5542e5/src/kernels/MIOpenCol2Im2d.cl#L89 I still see the error. May be im_offset is off lol
@junliume @bghimireamd Let follow this rule of thumb: https://github.com/ROCm/MIOpen/blob/7cf8180f80174be34ad68c5b988d5158f2602045/src/solver/pooling/forwardNaive.cpp#L125-L131
Unfortunately this bug is not reproducible with my 16GiB card.
export MIOPEN_WORKAROUND_ISSUE_2789=0
This case?
MIOPEN_DEBUG_DISABLE_FIND_DB=1 MIOPEN_DEBUG_FIND_ONLY_SOLVER=GemmBwdRest ./bin/MIOpenDriver conv -n 2048 -c 64 -H 224 -W 224 -k 64 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 2 -t 1 -V 0
If I change to
long ch_offset = im_ch col_w col_h wei_w wei_h;
it almost feels stuck and may run for a very long time. I think "simulated" GEMM solution might be more useful sometimes than the real ones :)
@atamazov could you compose a PR to make #2969 optional (enable by env var but it;s off by default)? That way we do not need to revert it, and we can resolve this urgent issue for now. Thanks!
@junliume This seems complicated because the "simulation" code was removed, but possible as a last resort. Note that reverting (or env control) is just another way of camouflaging the real problem. We can simply add missing find-db records (for vgg) and obtain very similar effect.
Let me try to make a full fix. The solver should not be applicable when col2im's limit is hit (i.e. offset overflow happens).
@junliume I am afraid this would not prevent overflow:
long ch_offset = im_ch col_w col_h wei_w wei_h;
It still performs all 4 muls in 32 bit domain, then cvt to i64.
please try this:
long ch_offset = (long)(im_ch * col_w) * (col_h * wei_w * wei_h);
2+1 muls in 32 bit domain, plus two conversions and 1x 64 bit mul.
Maybe it will show perf better than "stuck". If not, then I would rather simply narrow the applicability.
@junliume I am afraid this would not prevent overflow:
long ch_offset = im_ch col_w col_h wei_w wei_h;
It still performs all 4 muls in 32 bit domain, then cvt to i64.
please try this:
long ch_offset = (long)(im_ch * col_w) * (col_h * wei_w * wei_h);
2+1 muls in 32 bit domain, plus two conversions and 1x 64 bit mul.
Many it will show better perf. If not, then it's better to narrow the applicability.
Sure let me try something similar to this PR https://github.com/ROCm/MIOpen/issues/1956
@junliume #1956 is about 3D, where we do not have enough fast solvers. For 2D I recommend making the solver inapplicable when 64 bit indices are required.
Also it would be nice to switch from i32 to u32.
I was able to PASS the test after I changed from int (i32) to unsigned int(u32). I had to do the same in https://github.com/ROCm/MIOpen/blob/2de28ab6870fede26536fd9fa0fde51d2d5542e5/src/ocl/utilocl.cpp#L420
@atamazov this issue does not appear if we revert #2969
[Observation]: If we run
vgg16
model on 8 GPU concurrently:we will oserve
export MIOPEN_DEBUG_CONV_GEMM=0
can avoid the crash too.So the running theory: