ROCm / aotriton

Ahead of Time (AOT) Triton Math Library
MIT License
42 stars 15 forks source link

Add docker based package builder and switch to system compiler #51

Closed xinyazhang closed 2 weeks ago

xinyazhang commented 1 month ago

Major Changes:

Minor Changes

xinyazhang commented 1 month ago

Dockerfile tested locallly and can build 0.7.1b Compiler switch tested locally and can run test pytest ../test/test_backward.py -k 1.2 -v -x

jithunnair-amd commented 1 month ago

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h
ethanwee1 commented 1 month ago

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h

Validated 0.7.1b and 0.7b after following build.sh

aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz
jithunnair-amd commented 3 weeks ago

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h

Validated 0.7.1b and 0.7b after following build.sh

aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

@xinyazhang Taking this out of Draft mode as it is ready to merge (even if queued) based on Ethan's validation

xinyazhang commented 3 weeks ago

@jithunnair-amd nope, queued PR should not be taken out of draft because it's based on previous work (for this PR specifically its https://github.com/ROCm/aotriton/pull/50)

ethanwee1 commented 2 weeks ago

Tested with these commands to build aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

Commands:

git clone https://github.com/ROCm/aotriton.git
cd aotriton/
git checkout xinyazhang/manylinux_2_28-dockerfile
cd dockerfile/
export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm
mkdir -p output
TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log
tar tvf output/*.tar*

Output: Size: 107MB aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt buildlog2.log

jithunnair-amd commented 2 weeks ago

Tested with these commands to build aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

Commands:

git clone https://github.com/ROCm/aotriton.git
cd aotriton/
git checkout xinyazhang/manylinux_2_28-dockerfile
cd dockerfile/
export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm
mkdir -p output
TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log
tar tvf output/*.tar*

Output: aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt buildlog2.log

Some notable lines in the log:

xinyazhang commented 2 weeks ago

13 MiB is also significant considering it's functionality. I believe most of the size comes from the generated dispatching code. I've added this into the keepbook, but no concrete plan to implement it.