ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.05k stars 219 forks source link

[MI100][FP16] ConvHipImplicitGemmBwdDataV4R1Xdlops verification issues #1188

Open atamazov opened 2 years ago

atamazov commented 2 years ago
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64 128 28 28 --weights 16 128 1 1 --batch_size 64 --input_channels 128 --output_channels 16 --spatial_dim_elements 28 28 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.113259
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
...
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64 128 14 14 --weights 16 128 1 1 --batch_size 64 --input_channels 128 --output_channels 16 --spatial_dim_elements 14 14 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.116387
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
...
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 16 128 28 28 --weights 16 128 1 1 --batch_size 16 --input_channels 128 --output_channels 16 --spatial_dim_elements 28 28 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.117367
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops
...
./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64 256 56 56 --weights 64 256 1 1 --batch_size 64 --input_channels 256 --output_channels 64 --spatial_dim_elements 56 56 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW 
FAILED: 0.116415
Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops

Shortened configs:

./bin/test_conv2d --half --input 64 128 28 28 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --half --input 64 128 14 14 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --half --input 16 128 28 28 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1
./bin/test_conv2d --half --input 64 256 56 56 --weights 64 256 1 1 --pads_strides_dilations 0 0 1 1 1 1
carlushuang commented 2 years ago

@atamazov I seems can't reproduced on tip development? I build it locally, and computation is correct. How this is setup?

atamazov commented 2 years ago

This is from CI. Please see there: runs 2, 3 and 4 of the wa-936-cleanup branch. Possibly you need ROCm 4.3.1 to reproduce (but that would mean a problem in the 4.3.1 compiler!)

junliume commented 2 years ago

@carlushuang and @atamazov: issues not observed with tip of develop and ROCm 4.5.

atamazov commented 2 years ago

@junliume Did the library chose ConvHipImplicitGemmBwdDataV4R1Xdlops in your tests?

junliume commented 2 years ago

@junliume Did the library chose ConvHipImplicitGemmBwdDataV4R1Xdlops in your tests?

@atamazov is there a way to force picking this kernel? I still have the docker ready so I can do more quick tests.

atamazov commented 2 years ago

@junliume MIOPEN_FIND_MODE=normal MIOPEN_DEBUG_FIND_ONLY_SOLVER=ConvHipImplicitGemmBwdDataV4R1Xdlops <test or driver command>.

With driver, use -F 1. With test_conv, use --disable-backward-data --disable-backward-weights. Otherwise you'll see that Bwd and WrW are failing, which is out of scope of this ticket.

atamazov commented 2 years ago

Let's verify this (develop + latest ROCm RC) and close if not reproducible.

junliume commented 2 years ago

@shaojiewang @carlushuang it seems that we are still having issues with these configs: root@gb-sjc2-19:/root/MIOpen/build# ./bin/test_conv2d --half --input 64 256 56 56 --weights 64 256 1 1 --pads_strides_dilations 0 0 1 1 1 1 ./bin/test_conv2d --half --cmode conv --pmode default --group-count 1 --input 64 256 56 56 --weights 64 256 1 1 --batch_size 64 --input_channels 256 --output_channels 64 --spatial_dim_elements 56 56 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW FAILED: 0.09857 Iteration: 0 Backward convolution: ConvHipImplicitGemmBwdDataV4R1Xdlops Input tensor: 64, 256, 56, 56 Weights tensor: 64, 256, 1, 1 Output tensor: 64, 64, 56, 56 Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1}, Max diff: 289 Mismatch at 78400: -29 != -4

carlushuang commented 2 years ago

@asroy @ltqin @zjing14 this is a V4R1 legacy HIP kernel, can you help check on this, or need retire?

junliume commented 2 years ago

./bin/test_conv2d --half --input 64 128 28 28 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1 ./bin/test_conv2d --half --input 64 128 14 14 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1 ./bin/test_conv2d --half --input 16 128 28 28 --weights 16 128 1 1 --pads_strides_dilations 0 0 1 1 1 1 ./bin/test_conv2d --half --input 64 256 56 56 --weights 64 256 1 1 --pads_strides_dilations 0 0 1 1 1 1

@atamazov @JehandadKhan among all listed configurations, only the last one still failed if solver is not enforced: ./bin/test_conv2d --half --input 64 256 56 56 --weights 64 256 1 1 --pads_strides_dilations 0 0 1 1 1 1

Shall we disable the V4R1 legacy HIP kernel if @asroy @ltqin @zjing14 all agrees?

atamazov commented 2 years ago

Disabling is workaround. If we are not going to fix, then let's remove the solver.

JehandadKhan commented 2 years ago

Disabling is workaround. If we are not going to fix, then let's remove the solver.

I agree

ppanchad-amd commented 5 months ago

@atamazov Is this ticket still relevant? Thanks!