Enable group batch norm (--bnp) on ROCm (only bn_group = 1)

sryap commented 3 years ago

Enable NHWC group batch norm on a single GPU on ROCm (bn_group = 1). The multi-GPU case (bn_group > 1) will be revisited in the future.

The following are the main changes:

1) Use MIOpen data structures/functions for HIP instead of CUDNN 2) For the warp-level primitive code for HIP, we ensure that the code operates on 64-thread wide warp instead of 32-thread wide 3) Disable all the bn_group > 1 paths for HIP 4) Make all data pointers contiguous to support non-contiguous input tensors 5) Use 64-bit ReLU bitmask for HIP 6) For fusion kernels, rectify the output data if it is less than or equal to zero to improve stability of the kernels

Notes:

1) Multi-stream is not tested. 2) We have not optimized for performance

sryap commented 3 years ago

This is great. Only issue that I see is that the hipfication is done using a lot of ifdefs even for stuff that should be handled by hipfiy. @jithunnair-amd Do you agree?

I think we can merge this if @sarunyap can add a test that proves correctness for the extension, the scripts you provided through email should be fine. We should probably add it to the ci as well.

I can come in later and clean it up to use hipfiy more and limit the ifdefs to chunks of code that could not hipified

@micmelesse I have added a unit test for group BN. Please let me know if you have more comments. Thanks

micmelesse commented 3 years ago

@sarunyap I tried the unit test and it looks good.

There are some unit test failures. I am not quite sure if it is related to this PR or not. @hubertlu-tw Can you take a look at the failures and tell us if these failure are expected to fail.

sryap commented 3 years ago

I'm not able to reply to https://github.com/ROCmSoftwarePlatform/apex/pull/51#discussion_r699870260 for some reason

@jithunnair-amd

I think most of the above can be added to the hipify mappings file, except a couple like CUDNN_TENSOR_NHWC->0 and cudnnTensorFormat_t->int. @sarunyap Could you please elaborate on these two mappings? Are they done because MIOpen doesn't have any equivalent types? Would these mappings be correct in the general case or only for the use-case here?

So I'm replying here instead:

Our MIOpen has only one tensor format in the released version (as far as I know). It does not support NHWC yet. So, miopenTensorFormat_t and MIOPEN_TENSOR_NHWC do not exist yet. That's why I mapped it to 0 and int.

hubertlu-tw commented 3 years ago

@sarunyap I tried the unit test and it looks good.

There are some unit test failures. I am not quite sure if it is related to this PR or not. @hubertlu-tw Can you take a look at the failures and tell us if these failure are expected to fail.

I conducted A/B testing for this PR in the rocm4.3_ubuntu18.04_py3.6_pytorch Docker container and compared with the current master branch.

=== Test A === Apex built from ROCmSoftwarePlatform:master by the following command:

python3.6 setup.py install --cuda_ext --cpp_ext

The log file for building Apex can be found apex_install_A.txt. The log files for the unit tests can be found apex_unittests_A.txt and apex_distributed_unittests_A.txt.

=== Test B === First, merged sarunyap:group-bn-hip and ROCmSoftwarePlatform:master. Then, built Apex by the following command:

python3.6 setup.py install --cuda_ext --cpp_ext --bnp

The log file for building Apex can be found apex_install_bnp_merge_B.txt. The log files for the unit tests can be found apex_unittests_B.txt and apex_distributed_unittests_B.txt.

In summary, the failures of the unit tests for this PR are expected to fail. Please correct me if I missed something for the A/B testing.

sryap commented 3 years ago

@sarunyap I tried the unit test and it looks good. There are some unit test failures. I am not quite sure if it is related to this PR or not. @hubertlu-tw Can you take a look at the failures and tell us if these failure are expected to fail.

I conducted A/B testing for this PR in the rocm4.3_ubuntu18.04_py3.6_pytorch Docker container and compared with the current master branch.

=== Test A === Apex built from ROCmSoftwarePlatform:master by the following command:
python3.6 setup.py install --cuda_ext --cpp_ext
The log file for building Apex can be found apex_install_A.txt. The log files for the unit tests can be found apex_unittests_A.txt and apex_distributed_unittests_A.txt.

=== Test B === First, merged sarunyap:group-bn-hip and ROCmSoftwarePlatform:master. Then, built Apex by the following command:
python3.6 setup.py install --cuda_ext --cpp_ext --bnp
The log file for building Apex can be found apex_install_bnp_merge_B.txt. The log files for the unit tests can be found apex_unittests_B.txt and apex_distributed_unittests_B.txt.

In summary, the failures of the unit tests for this PR are expected to fail. Please correct me if I missed something for the A/B testing.

If I understand correctly, the failed tests are not related to this PR, right?

hubertlu-tw commented 3 years ago

@sarunyap I tried the unit test and it looks good. There are some unit test failures. I am not quite sure if it is related to this PR or not. @hubertlu-tw Can you take a look at the failures and tell us if these failure are expected to fail.

I conducted A/B testing for this PR in the rocm4.3_ubuntu18.04_py3.6_pytorch Docker container and compared with the current master branch. === Test A === Apex built from ROCmSoftwarePlatform:master by the following command:
python3.6 setup.py install --cuda_ext --cpp_ext
The log file for building Apex can be found apex_install_A.txt. The log files for the unit tests can be found apex_unittests_A.txt and apex_distributed_unittests_A.txt. === Test B === First, merged sarunyap:group-bn-hip and ROCmSoftwarePlatform:master. Then, built Apex by the following command:
python3.6 setup.py install --cuda_ext --cpp_ext --bnp
The log file for building Apex can be found apex_install_bnp_merge_B.txt. The log files for the unit tests can be found apex_unittests_B.txt and apex_distributed_unittests_B.txt. In summary, the failures of the unit tests for this PR are expected to fail. Please correct me if I missed something for the A/B testing.
If I understand correctly, the failed tests are not related to this PR, right?

Yes, the failed tests are not related to this PR.

micmelesse commented 3 years ago

@hubertlu-tw Thanks for confirming.

ROCm / apex

Enable group batch norm (--bnp) on ROCm (only bn_group = 1) #51