brianrobt / python-jax-rocm

python-jax-rocm AUR package
2 stars 0 forks source link

makepkg fails with "ROCm Configuration Error: Cannot find rocm library amdhip64" #3

Closed brianrobt closed 1 week ago

brianrobt commented 1 week ago

Hello,

When building the package, I get the error, "ROCm Configuration Error: Cannot find rocm library amdhip64. I'm not very familiar with ROCm, and have been troubleshooting it for a little while without any success. I'll reply back here when I find the issue.

Here is the full output of makepkg:

╭─    ~/workspace/python-jax-rocm  on   update-sha256sum ?3                                                                                                                                                                ✔  at 02:42:05 PM  ─╮
╰─ makepkg -s                                                                                                                                                                                                                                             ─╯
==> Making package: python-jaxlib-rocm 0.4.16-1 (Wed 12 Jun 2024 02:42:11 PM CDT)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Found jax-jaxlib-v0.4.16.tar.gz
-> Found xla-rocm-jaxlib-v0.4.16.tar.gz
==> Validating source files with sha256sums...
jax-jaxlib-v0.4.16.tar.gz ... Passed
xla-rocm-jaxlib-v0.4.16.tar.gz ... Passed
==> Extracting sources...
-> Extracting jax-jaxlib-v0.4.16.tar.gz with bsdtar
-> Extracting xla-rocm-jaxlib-v0.4.16.tar.gz with bsdtar
==> Starting prepare()...
==> Removing existing $pkgdir/ directory...
==> Starting build()...

_   _  __  __
| | / \ \ \/ /
_  | |/ _ \ \  /
| |_| / ___ \/  \
\___/_/   \/_/\_\

b"\x1b[31mERROR: The project you're trying to build requires Bazel 6.*.* (specified in /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelversion), but it wasn't found in /usr/bin.\x1b[0m\n\nBazel binaries for all official releases can be downloaded from here:\n  https://github.com/bazelbuild/bazel/releases\n\nPlease put the downloaded Bazel binary into this location:\n  /usr/bin/bazel-6.*.*-linux-x86_64\n"
Bazel binary path: ./bazel-6.1.2-linux-x86_64
Bazel version: 6.1.2
Python binary path: /usr/bin/python
Python version: 3.12
NumPy version: 1.26.4
MKL-DNN enabled: yes
Target CPU: x86_64
Target CPU features: release
CUDA enabled: no
TPU enabled: no
ROCm enabled: yes
ROCm amdgpu targets: gfx803,gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx1102

Building XLA and installing it in the jaxlib source tree...
./bazel-6.1.2-linux-x86_64 run --verbose_failures=true //jaxlib/tools:build_wheel -- --output_path=/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/dist --cpu=x86_64
INFO: Options provided by the client:
Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'run' from /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'run' from /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc:
Inherited 'build' options: --nocheck_visibility --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true --define=tsl_link_protobuf=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@xla//xla/python:enable_gpu=false
INFO: Reading rc options for 'run' from /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.jax_configure.bazelrc:
Inherited 'build' options: --strategy=Genrule=standalone --repo_env PYTHON_BIN_PATH=/usr/bin/python --action_env=PYENV_ROOT --python_path=/usr/bin/python --override_repository=xla=/home/brian/workspace/python-jax-rocm/src/xla-rocm-jaxlib-v0.4.16 --config=avx_posix --config=mkl_open_source_only --config=rocm
INFO: Found applicable config definition build:short_logs in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:avx_posix in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --copt=-mavx --host_copt=-mavx
INFO: Found applicable config definition build:mkl_open_source_only in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1
INFO: Found applicable config definition build:rocm in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm=true --define=using_rocm_hipcc=true --@xla//xla/python:enable_gpu=true --define=xla_python_enable_gpu=true --repo_env TF_NEED_ROCM=1 --action_env TF_ROCM_AMDGPU_TARGETS=gfx900,gfx906,gfx908,gfx90a,gfx1030
INFO: Found applicable config definition build:rocm in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.jax_configure.bazelrc: --action_env TF_ROCM_AMDGPU_TARGETS=gfx803,gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1101,gfx1102
INFO: Found applicable config definition build:linux in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --config=posix --copt=-Wno-unknown-warning-option --copt=-Wno-stringop-truncation --copt=-Wno-array-parameter
INFO: Found applicable config definition build:posix in file /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/.bazelrc: --copt=-fvisibility=hidden --copt=-Wno-sign-compare --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
Loading:
DEBUG: /home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/xla/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'tf_runtime' because it already exists.
DEBUG: /home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/xla/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'llvm-raw' because it already exists.
INFO: Repository local_config_rocm instantiated at:
/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/WORKSPACE:15:15: in <toplevel>
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/xla/workspace2.bzl:90:19: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/workspace2.bzl:626:19: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/workspace2.bzl:76:19: in _tf_toolchains
Repository rule rocm_configure defined at:
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl:855:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_rocm':
Traceback (most recent call last):
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 836, column 38, in _rocm_autoconf_impl
_create_local_rocm_repository(repository_ctx)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 625, column 27, in _create_local_rocm_repository
rocm_libs = _find_libs(repository_ctx, rocm_config, hipfft_or_rocfft, miopen_path, rccl_path, bash_bin)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 366, column 34, in _find_libs
return _select_rocm_lib_paths(repository_ctx, libs_paths, bash_bin)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 328, column 36, in _select_rocm_lib_paths
auto_configure_fail("Cannot find rocm library %s" % name)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 153, column 9, in auto_configure_fail
fail("\n%sROCm Configuration Error:%s %s\n" % (red, no_color, msg))
Error in fail:
ROCm Configuration Error: Cannot find rocm library amdhip64
ERROR: /home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/WORKSPACE:15:15: fetching rocm_configure rule //external:local_config_rocm: Traceback (most recent call last):
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 836, column 38, in _rocm_autoconf_impl
_create_local_rocm_repository(repository_ctx)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 625, column 27, in _create_local_rocm_repository
rocm_libs = _find_libs(repository_ctx, rocm_config, hipfft_or_rocfft, miopen_path, rccl_path, bash_bin)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 366, column 34, in _find_libs
return _select_rocm_lib_paths(repository_ctx, libs_paths, bash_bin)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 328, column 36, in _select_rocm_lib_paths
auto_configure_fail("Cannot find rocm library %s" % name)
File "/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/gpus/rocm_configure.bzl", line 153, column 9, in auto_configure_fail
fail("\n%sROCm Configuration Error:%s %s\n" % (red, no_color, msg))
Error in fail:
ROCm Configuration Error: Cannot find rocm library amdhip64
INFO: Repository rules_cc instantiated at:
/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/WORKSPACE:18:15: in <toplevel>
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/xla/workspace1.bzl:12:19: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/workspace1.bzl:17:28: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/rules_cuda/cuda/dependencies.bzl:72:18: in rules_cuda_dependencies
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/rules_cuda/cuda/dependencies.bzl:35:17: in _rules_cc
Repository rule http_archive defined at:
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/bazel_tools/tools/build_defs/repo/http.bzl:372:31: in <toplevel>
INFO: Repository rules_python instantiated at:
/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/WORKSPACE:15:15: in <toplevel>
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/xla/workspace2.bzl:90:19: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/workspace2.bzl:636:21: in workspace
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/workspace2.bzl:528:20: in _tf_repositories
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
/home/brian/.cache/bazel/_bazel_brian/fb67b6bd5fdb81965dfcaa425bd9dfea/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Skipping '@xla//xla/python:enable_gpu': no such package '@local_config_rocm//rocm':
ROCm Configuration Error: Cannot find rocm library amdhip64
WARNING: Target pattern parsing failed.
ERROR: @xla//xla/python:enable_gpu :: Error loading option @xla//xla/python:enable_gpu: no such package '@local_config_rocm//rocm':
ROCm Configuration Error: Cannot find rocm library amdhip64
Traceback (most recent call last):
File "/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/build/build.py", line 591, in <module>
main()
File "/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/build/build.py", line 572, in main
shell(command)
File "/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/build/build.py", line 53, in shell
output = subprocess.check_output(cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['./bazel-6.1.2-linux-x86_64', 'run', '--verbose_failures=true', '//jaxlib/tools:build_wheel', '--', '--output_path=/home/brian/workspace/python-jax-rocm/src/jax-jaxlib-v0.4.16/dist', '--cpu=x86_64']' returned non-zero exit status 2.
==> ERROR: A failure occurred in build().
Aborting...
brianrobt commented 1 week ago

After some more digging, I was able to get past the error. The same error occurred for roctracer64. Both the libroctracer.so and libamdhip64 libs were already installed in /opt/rocm/lib. However, the rocm_configure.bzl file was looking for both of those libraries under /opt/rocm/hip, and /opt/rocm/roctracer, respectively.

acxz commented 1 week ago

Hello @brianrobt, I noticed that you picked up maintainership for this package on the AUR. Would you be okay if I transfer over this repo to you?

I tried to do the transfer, but as you have this repo forked, the transfer fails with "Repository brianrobt/python-jax-rocm already exists". I think you'll have to remove or rename your fork.

brianrobt commented 1 week ago

@acxz Sure thing, feel free to transfer it over. I'll delete the fork after this post.

acxz commented 1 week ago

sent your way! take good care of her.

brianrobt commented 1 week ago

v0.4.16 was a bit outdated. When building the latest version, v0.4.29, this issue doesn't occur.