Closed xinyazhang closed 2 weeks ago
Dockerfile tested locallly and can build 0.7.1b
Compiler switch tested locally and can run test pytest ../test/test_backward.py -k 1.2 -v -x
@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:
$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root 1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root 592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root 566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root 422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root 5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root 695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root 3316 2024-10-29 08:48 aotriton/include/aotriton/util.h
@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:
$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/lib/ -rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/_internal/ -rw-r--r-- root/root 1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h -rw-r--r-- root/root 592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h -rw-r--r-- root/root 566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h -rw-r--r-- root/root 422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h -rw-r--r-- root/root 5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h -rw-r--r-- root/root 695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h -rw-r--r-- root/root 3316 2024-10-29 08:48 aotriton/include/aotriton/util.h
Validated 0.7.1b and 0.7b after following build.sh
aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:
$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/lib/ -rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/ drwxr-xr-x root/root 0 2024-10-29 09:11 aotriton/include/aotriton/_internal/ -rw-r--r-- root/root 1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h -rw-r--r-- root/root 592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h -rw-r--r-- root/root 566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h -rw-r--r-- root/root 422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h -rw-r--r-- root/root 5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h -rw-r--r-- root/root 695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h -rw-r--r-- root/root 3316 2024-10-29 08:48 aotriton/include/aotriton/util.h
Validated 0.7.1b and 0.7b after following build.sh
aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
@xinyazhang Taking this out of Draft mode as it is ready to merge (even if queued) based on Ethan's validation
@jithunnair-amd nope, queued PR should not be taken out of draft because it's based on previous work (for this PR specifically its https://github.com/ROCm/aotriton/pull/50)
Tested with these commands to build
aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
Commands:
git clone https://github.com/ROCm/aotriton.git
cd aotriton/
git checkout xinyazhang/manylinux_2_28-dockerfile
cd dockerfile/
export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm
mkdir -p output
TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log
tar tvf output/*.tar*
Output: Size: 107MB aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt buildlog2.log
Tested with these commands to build
aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
Commands:
git clone https://github.com/ROCm/aotriton.git cd aotriton/ git checkout xinyazhang/manylinux_2_28-dockerfile cd dockerfile/ export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm mkdir -p output TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log tar tvf output/*.tar*
Output: aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt buildlog2.log
Some notable lines in the log:
-rw-r--r-- root/root 13064056 2024-11-13 18:01 aotriton/lib/libaotriton_v2.so
: libaotriton_v2.so reduced to 13MBaotriton/lib/aotriton.images/amd-gfx90a/flash/attn_fwd/FONLY__^bf16@16,False,128,False,False,False,0___MI200.aks2
13 MiB is also significant considering it's functionality. I believe most of the size comes from the generated dispatching code. I've added this into the keepbook, but no concrete plan to implement it.
Major Changes:
dockerfile/build.sh
script that builds AOTriton from offical AlmaLinux 8 docker image.hipcc
is not compatible with SCL.scl enable gcc-toolset-13 "/opt/rocm/bin/hipcc -v"
is supposed to show "Found candidate GCC installation" but nothing is displayed, and eventually this triggers https://github.com/ROCm/ROCm/issues/1843 during compiling.Minor Changes
pyaotriton
, registerhipError_t
locally to avoid "hipError_t
is already registered" bug.