ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.05k stars 219 forks source link

test_conv_igemm_dynamic_xdlops_wrw fail when testing half #1015

Open carlushuang opened 3 years ago

carlushuang commented 3 years ago

test on local gfx908 machine and server in US, rocm-3.10/rocm-4.2 using the latest develop 120289fcb33496db05518b24e3a39db01d5adb5c, using following step to build:

mkdir build && cd build
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_TEST_HALF=On -DMIOPEN_TEST_GFX908=On -DMIOPEN_TEST_ALL=On -DBUILD_DEV=Off -DCMAKE_INSTALL_PREFIX=../install -DCMAKE_BUILD_TYPE=release -DMIOPEN_TEST_FLAGS=--verbose --disable-verification-cache ..
make test_conv_igemm_dynamic_xdlops_wrw -j64

will see fail log:

[100%] Built target test_conv2d
Scanning dependencies of target test_conv_igemm_dynamic_xdlops_wrw
/dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 64, 28, 28 --weights 32, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW
error: 5.1932e-05
Max diff: 2
Mismatch at 1: 2086 != 2088
Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops
Input tensor: 64, 64, 28, 28
Weights tensor: 32, 64, 1, 1
Output tensor: 64, 32, 28, 28
Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
full test log ``` /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 64, 28, 28 --weights 32, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 5.1932e-05 Max diff: 2 Mismatch at 1: 2086 != 2088 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 64, 64, 28, 28 Weights tensor: 32, 64, 1, 1 Output tensor: 64, 32, 28, 28 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 16, 128, 36, 36 --weights 32, 128, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0 Max diff: 0 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 16, 128, 36, 36 Weights tensor: 32, 128, 1, 1 Output tensor: 16, 32, 36, 36 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 64, 56, 56 --weights 256, 64, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 8.25971e-05 Max diff: 4 Mismatch at 2: 2398 != 2400 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 64, 64, 56, 56 Weights tensor: 256, 64, 1, 1 Output tensor: 64, 256, 56, 56 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 64, 224, 17, 17 --weights 224, 224, 1, 7 --pads_strides_dilations 0 3 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0.0523345 Max diff: 491 Mismatch at 0: -115 != -213 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 64, 224, 17, 17 Weights tensor: 224, 224, 1, 7 Output tensor: 64, 224, 17, 17 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 3}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 128, 128, 35, 35 --weights 256, 128, 3, 3 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0.0405109 Max diff: 1137 Mismatch at 0: 2628 != 2582 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 128, 128, 35, 35 Weights tensor: 256, 128, 3, 3 Output tensor: 128, 256, 35, 35 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 128, 128, 64, 64 --weights 256, 128, 3, 3 --pads_strides_dilations 1 1 2 2 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 6.40783e-05 Max diff: 4 Mismatch at 36: 2166 != 2168 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 128, 128, 64, 64 Weights tensor: 256, 128, 3, 3 Output tensor: 128, 256, 32, 32 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {2, 2}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 128, 768, 17, 17 --weights 256, 768, 3, 3 --pads_strides_dilations 1 1 1 1 2 2 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0.0471048 Max diff: 725 Mismatch at 0: -28 != -161 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 128, 768, 17, 17 Weights tensor: 256, 768, 3, 3 Output tensor: 128, 256, 15, 15 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {2, 2}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 3, 256, 28, 28 --weights 80, 256, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0 Max diff: 0 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 3, 256, 28, 28 Weights tensor: 80, 256, 1, 1 Output tensor: 3, 80, 28, 28 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 2, 256, 12, 18 --weights 256, 256, 3, 3 --pads_strides_dilations 1 1 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0.0143045 Max diff: 16 Mismatch at 0: -10 != -3 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 2, 256, 12, 18 Weights tensor: 256, 256, 3, 3 Output tensor: 2, 256, 12, 18 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {1, 1}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 4, 512, 128, 128 --weights 12, 512, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 7.13175e-05 Max diff: 2 Mismatch at 516: -2762 != -2764 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 4, 512, 128, 128 Weights tensor: 12, 512, 1, 1 Output tensor: 4, 12, 128, 128 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 4, 32, 79, 141 --weights 64, 32, 5, 10 --pads_strides_dilations 0 0 2 2 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0.0277231 Max diff: 196 Mismatch at 0: -338 != -283 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 4, 32, 79, 141 Weights tensor: 64, 32, 5, 10 Output tensor: 4, 64, 38, 66 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {2, 2}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 400, 256, 7, 7 --weights 1024, 256, 7, 7 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0 Max diff: 0 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 400, 256, 7, 7 Weights tensor: 1024, 256, 7, 7 Output tensor: 400, 1024, 1, 1 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, /dockerx/repo/MIOpen/build/bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-data --input 400, 256, 1, 1 --weights 1024, 256, 1, 1 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW error: 0 Max diff: 0 Backward weights convolution: ConvAsmImplicitGemmGTCDynamicWrwXdlops Input tensor: 400, 256, 1, 1 Weights tensor: 1024, 256, 1, 1 Output tensor: 400, 1024, 1, 1 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, [100%] Built target test_conv_igemm_dynamic_xdlops_wrw ```

~However, for CI test like here, this test test_conv_igemm_dynamic_xdlops_wrw is indeed passed.~

However, if using ctest -R test_conv_igemm_dynamic_xdlops_wrw to do test, it indeed can pass.(But if add --verbose to the cmd line, can see computation fail)

root@rocm-framework-11:/dockerx/repo/MIOpen/build# ctest  -R test_conv_igemm_dynamic_xdlops_wrw
Test project /dockerx/repo/MIOpen/build
    Start 71: test_conv_igemm_dynamic_xdlops_wrw
1/1 Test #71: test_conv_igemm_dynamic_xdlops_wrw ...   Passed  101.36 sec

~Will upgrade rocm environment and test again.~ tested, still fail. Will take with @shaojiewang to check this issue

cc @atamazov @junliume

shaojiewang commented 3 years ago

For fp16, the precision is about 3 when number is bigger than about 2000.

ppanchad-amd commented 6 months ago

@carlushuang, Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if the issue still exists? If resolved, please close the ticket. Thanks.

ppanchad-amd commented 3 months ago

@shaojiewang Is this resolved in latest ROCm 6.1.1? Thanks!