Enable multihead atten - Githubissues

Installation:

python setup.py install --cpp_ext --cuda_ext --distributed_adam --xentropy --deprecated_fused_adam --fast_multihead_attn 2>&1 | tee ../apex_build_mha.log

Unit tests

cd tests/L0/ && bash run_rocm.sh 2>&1 | tee ../../apex_unittests.txt cd tests/distributed/ && bash run_rocm_distributed.sh 2>&1 | tee ../../apex_distributed_unittests.txt

Notice that I have built the apex on a MI200 server and confirmed that there is no new failing unit tests introduced.

Unit tests for the extension

(https://github.com/ROCmSoftwarePlatform/apex/tree/dev/hubertlu/multihead_attn/apex/contrib/test/multihead_attn) <html xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Notice that the features for the failed test in test_fast_self_multihead_attn_bias.py on ROCm are not used by MLPerf team. We will need to root cause it later. In addition, the failed test in test_self_multihead_attn_norm_add.py is due to a missing 1 required positional argument in the upstream (NVIDIA) script.

Lastly, the current CI checks do not run the unit tests for extensions (such as groupbn, layer_norm, multihead_attn, and test_label_smoothing.py). We will need to add them to our CI checks later.

ROCm / apex

Enable multihead atten #56

Installation:

Unit tests

Unit tests for the extension