ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.07k stars 224 forks source link

[gfx10] ConvOclBwdWrW1x1 fails #670

Open zjing14 opened 3 years ago

zjing14 commented 3 years ago

ROCm 4.0, Nav21

Fp32

MIOpenDriver convfp16 -V 1 -F 0 -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -t 1 
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 25/ConvOclBwdWrW1x1 
Backward Convolution Weights Failed: 0.163441 > 0.0164 
MIOpenDriver convfp16 -V 1 -F 0 -n 256 -c 256 -H 56 -W 56 -k 128 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -t 1 
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 25/ConvOclBwdWrW1x1 
Backward Convolution Weights Failed: 0.103369 > 0.0164 

Fp16

MIOpenDriver convfp16 -V 1 -F 0 -n 256 -c 256 -H 56 -W 56 -k 128 -y 1 -x 1 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -t 1
MIOpen Backward Weights Conv. Algorithm: 1, Solution: 25/ConvOclBwdWrW1x1
Backward Convolution Weights Failed: 0.103369 > 0.0164
atamazov commented 3 years ago

Please fix typo image

atamazov commented 2 years ago

@junliume I am afraid that this should be reopened (recommended labels: https://github.com/ROCmSoftwarePlatform/MIOpen/labels/urgency_normal https://github.com/ROCmSoftwarePlatform/MIOpen/labels/quality https://github.com/ROCmSoftwarePlatform/MIOpen/labels/performance https://github.com/ROCmSoftwarePlatform/MIOpen/labels/workaround)

Right now we have WORKAROUND_SWDEV_266868. I think that at first we need to make the decision to either fix the issue or convert W/A into a permanent part of the solver's design.

I would like to mention two circumstances that may affect the decision.

Config Without W/A With W/A Relative performance
convfp16 -F 4 -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 3.1 5.2 0.60
convfp16 -F 4 -n 256 -c 256 -H 56 -W 56 -k 128 -y 1 -x 1 -p 0 -q 0 5.9 13.7 0.43
conv -F 4 -n 256 -c 128 -H 28 -W 28 -k 512 -y 1 -x 1 -p 0 -q 0 6.3 15.2 0.41
conv -F 4 -n 256 -c 256 -H 56 -W 56 -k 128 -y 1 -x 1 -p 0 -q 0 12.4 23.4 0.53
atamazov commented 2 years ago

@junliume I recommend removing https://github.com/ROCmSoftwarePlatform/MIOpen/labels/bug and https://github.com/ROCmSoftwarePlatform/MIOpen/labels/correctness -- that would better match the current status of the issue.

ppanchad-amd commented 7 months ago

@atamazov Is this fixed in ROCm 6.0.2? Thanks

atamazov commented 7 months ago

@ppanchad-amd Not fixed. It is necessary to further investigate the problem, so some engineer must be assigned.

@junliume I recommend adding [gfx11] to the title as WORKAROUND_SWDEV_266868 affects that target as well as gfx10.