Closed hubertlu-tw closed 2 years ago
python setup.py install --cpp_ext --cuda_ext --distributed_adam --xentropy --deprecated_fused_adam --fast_multihead_attn 2>&1 | tee ../apex_build_mha.log
cd tests/L0/ && bash run_rocm.sh 2>&1 | tee ../../apex_unittests.txt cd tests/distributed/ && bash run_rocm_distributed.sh 2>&1 | tee ../../apex_distributed_unittests.txt
cd tests/L0/ && bash run_rocm.sh 2>&1 | tee ../../apex_unittests.txt
cd tests/distributed/ && bash run_rocm_distributed.sh 2>&1 | tee ../../apex_distributed_unittests.txt
Notice that I have built the apex on a MI200 server and confirmed that there is no new failing unit tests introduced.
(https://github.com/ROCmSoftwarePlatform/apex/tree/dev/hubertlu/multihead_attn/apex/contrib/test/multihead_attn) <html xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
Installation:
python setup.py install --cpp_ext --cuda_ext --distributed_adam --xentropy --deprecated_fused_adam --fast_multihead_attn 2>&1 | tee ../apex_build_mha.log
Unit tests
cd tests/L0/ && bash run_rocm.sh 2>&1 | tee ../../apex_unittests.txt
cd tests/distributed/ && bash run_rocm_distributed.sh 2>&1 | tee ../../apex_distributed_unittests.txt
Notice that I have built the apex on a MI200 server and confirmed that there is no new failing unit tests introduced.
Unit tests for the extension
(https://github.com/ROCmSoftwarePlatform/apex/tree/dev/hubertlu/multihead_attn/apex/contrib/test/multihead_attn) <html xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
| CUDA | ROCm -- | -- | -- test_encdec_multihead_attn.py | PASS | PASS test_encdec_multihead_attn_norm_add.py | PASS | PASS test_fast_self_multihead_attn_bias.py | PASS | **FAILED** test_mha_fused_softmax.py | PASS | PASS test_self_multihead_attn.py | PASS | PASS test_self_multihead_attn_norm_add.py | FAILED | FAILED