alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.
https://alpa.ai
Apache License 2.0
3.07k stars 357 forks source link

Install Alpa without GPU #943

Open AlbertZhangHIT opened 1 year ago

AlbertZhangHIT commented 1 year ago

Please describe the bug Alpa-modified jaxlib can not be built on CPU-only environment.

Please describe the expected behavior

System information and environment

To Reproduce Steps to reproduce the behavior:

  1. cd build_jaxlib
  2. python build/build.py --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa
  3. See error: Failed to build alpa-jax without GPU
    
    Bazel binary path: ./bazel-5.1.1-linux-x86_64
    Bazel version: 5.1.1
    Python binary path: /usr/bin/python
    Python version: 3.7
    NumPy version: 1.21.6
    MKL-DNN enabled: yes
    Target CPU: x86_64
    Target CPU features: release
    CUDA enabled: no
    TPU enabled: no
    Remote TPU enabled: no
    ROCm enabled: no
    Plugin device enabled: no

Building XLA and installing it in the jaxlib source tree... ./bazel-5.1.1-linux-x86_64 run --verbose_failures=true :build_wheel -- --output_path=/home/stack/softwares/alpa/build_jaxlib/dist --cpu=x86_64 --dev_install Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.jax_configure.bazelrc: Inherited 'build' options: --strategy=Genrule=standalone --repo_env PYTHON_BIN_PATH=/usr/bin/python --action_env=PYENV_ROOT --python_path=/usr/bin/python --distinct_host_configuration=false --override_repository=org_tensorflow=/home/stack/softwares/alpa/build_jaxlib/../third_party/tensorflow-alpa --config=avx_posix --config=mkl_open_source_only INFO: Found applicable config definition build:short_logs in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:avx_posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-mavx --host_copt=-mavx INFO: Found applicable config definition build:mkl_open_source_only in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1 INFO: Found applicable config definition build:linux in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --config=posix --copt=-Wno-unknown-warning-option --copt=-Wno-stringop-truncation --copt=-Wno-array-parameter INFO: Found applicable config definition build:posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-fvisibility=hidden --copt=-Wno-sign-compare --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false ERROR: @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu :: Error loading option @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu: error loading package '': Every .bzl file must have a corresponding package, but '//third_party/ducc:workspace.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist. b'' Traceback (most recent call last): File "build/build.py", line 580, in main() File "build/build.py", line 575, in main shell(command) File "build/build.py", line 53, in shell output = subprocess.check_output(cmd) File "/usr/lib64/python3.7/subprocess.py", line 421, in check_output **kwargs).stdout File "/usr/lib64/python3.7/subprocess.py", line 522, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['./bazel-5.1.1-linux-x86_64', 'run', '--verbose_failures=true', ':build_wheel', '--', '--output_path=/home/stack/softwares/alpa/build_jaxlib/dist', '--cpu=x86_64', '--dev_install']' returned non-zero exit status 2.



**Screenshots**
If applicable, add screenshots to help explain your problem.

**Code snippet to reproduce the problem**

**Additional information**
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.
matthewygf commented 9 months ago

Seems to happen for CUDA build as well here.

...
CUDA enabled: yes
NCCL enabled: yes
...

EDIT: Just to answer myself here, it works for me when I made sure

  1. Jax is at the commit 41417ee or version 0.3.22.
  2. somehow during git clone, most of the symlinks don't work properly at build_jaxlib, so I had to recreate them, jax, jaxlib and third_party symlinks