Closed causten closed 1 year ago
I apologize for keeping somewhat outdated installation and running instructions. Here is the best way to set up and run your tests:
1) pull and launch the dedicated rocm/AIT docker: docker pull rocm/composable_kernel:ait_rocm5.3 alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ~/dockerx:/dockerx' drun rocm/composable_kernel:ait_rocm5.3
2) inside the docker, update the rocm/AIT code to the latest version: rm -rf AITemplate git clone --recursive https://github.com/ROCmSoftwarePlatform/AITemplate.git
3) refresh the installation: cd AITemplate/python/ pip3 uninstall -y aitemplate python3 setup.py bdist_wheel pip3 install dist/*.whl
4) run BERT: cd ../examples/03_bert/ HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py
After following all of these steps I can confirm that the HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py --batch-size 1 --seq-length 384 model runs fine, but the HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py --batch-size 1 --seq-length 384 --encoders-only False model throws the following error:
Traceback (most recent call last):
File "benchmark_ait.py", line 354, in
We will try to have this fixed before the end of the month.
Not fixed. I used the "merge_upstream" branch and it fails...
HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py --encoders-only False
make: Entering directory '/dockerx/AITemplate/examples/03_bert/tmp/BERT_fast_gelu_1_64'
hipcc -O3 -fPIC -fvisibility=hidden -std=c++17 -w -DCK_TIME_KERNEL=0 -Xclang -mlink-builtin-bitcode -Xclang /opt/rocm/amdgcn/bitcode/oclc_abi_version_400.bc -DCK_AMD_GPU_GFX90A --amdgpu-target=gfx90a -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel/include/ -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel/external/include/half/ -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel/library/include/ -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel/profiler/include/ -I/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/../static/include -L/opt/rocm/rocrand/lib/ -lrocrand -DNDEBUG -x hip -c -o bmm_softmax_bmm_permute_8.obj bmm_softmax_bmm_permute_8.cpp
make: Leaving directory '/dockerx/AITemplate/examples/03_bert/tmp/BERT_fast_gelu_1_64'
make stderr: bert_embeddings_0.cpp:21:159: error: template argument for template type parameter must be a type
auto device_instance = ck::tensor_operation::device::DeviceSparseEmbeddingsForwardLayernorm<ck::half_t, int64_t, ck::half_t, ck::half_t, float, ck::half_t, 256, 1, 256, 1, EMBEDDING_DIM, 1, 1, 3>{};
^~~
/usr/local/lib/python3.8/dist-packages/aitemplate/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/device/impl/device_sparse_embeddings_forward_layernorm.hpp:27:20: note: template parameter is declared here
typename EmbElementwiseOperation,
^
1 error generated when compiling for gfx90a.
make: *** [Makefile:9: bert_embeddings_0.obj] Error 1
make: *** Waiting for unfinished jobs....
2023-02-02 17:08:49,401 INFO <aitemplate.compiler.compiler> compiled the final .so file elapsed time: 0:00:36.827257
Traceback (most recent call last):
File "benchmark_ait.py", line 354, in <module>
compile_and_benchmark()
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "benchmark_ait.py", line 340, in compile_and_benchmark
mod = compile_module(
File "benchmark_ait.py", line 224, in compile_module
mod = compile_model(y, target, "./tmp", model_name)
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 260, in compile_model
module = Model(
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 227, in __init__
self.DLL = self._DLLWrapper(lib_path, num_runtimes, allocator_kind)
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 166, in __init__
self.DLL = ctypes.cdll.LoadLibrary(lib_path)
File "/usr/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: ./tmp/BERT_fast_gelu_1_64/test.so: cannot open shared object file: No such file or directory
Exception ignored in: <function Model.__del__ at 0x7efce696e0d0>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 257, in __del__
self.close()
File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 261, in close
for ptr in list(self._allocated_ait_data):
AttributeError: 'Model' object has no attribute '_allocated_ait_data'
root@zt-dh170-13:/dockerx/AITemplate/examples/03_bert#
I was using commit...
commit 2eaed6cd171eaf4c8aeec931e74bb8bfb21cbe24 (HEAD -> merge_upstream, origin/merge_upstream)
Author: fsx950223 <fsx950223@gmail.com>
Date: Thu Feb 2 00:16:38 2023 +0800
fix a bug
Could you run it in a new environment? https://github.com/ROCmSoftwarePlatform/AITemplate/blob/merge_upstream/python/aitemplate/backend/rocm/embedding/bert_embeddings.py#L44
@causten I checked on my local environment and --encoders-only True
flag is OK to use.
From the log it's pretty much like the CK version (from 3rdparty) is not updated. The simplest approach is to use a clean docker, and reinstall AIT from beginning , then do the test.
If you already has a AIT repo cloned, after update, make sure use git submodule update
to update all the submodule to corresponding version. Inside AIT, in case there is any code change inside the AIT, make sure rm -rf ~/.aitemplate
to clean cache in case previous AIT version and current AIT version has any changes.
It's the "False" I needed, it's True by default. but I'll repeat delete everything and repull rocm/composable_kernel:ait_rocm5.3
@causten I see. Another thing is, If you need to use --encoders-only False
, then you need to make sure add this flag while building the model, as well as running it afterward, at the same time
e.g.
HIP_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 benchmark_ait.py --encoders-only False
to compile the model, then python3 benchmark_ait.py --batch-size 1 --seq-length 384 --encoders-only False
It works. Thanks
commit e282ff06b56609e8a0ee8925192520f8ecce9186 rocm-5.3.0
I ran these commands inside the container...
I then waited until BS 64 and SEQ 384 were complete
I then ran
HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py --batch-size 1 --seq-length 384 --encoders-only False
Failed with
I then tried
HIP_VISIBLE_DEVICES=0 python3 benchmark_ait.py --encoders-only False
and it fails in the same way