alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.
https://alpa.ai
Apache License 2.0
3.08k stars 357 forks source link

Install Alpa without GPU #943

Open AlbertZhangHIT opened 1 year ago

AlbertZhangHIT commented 1 year ago

Please describe the bug Alpa-modified jaxlib can not be built on CPU-only environment.

Please describe the expected behavior

System information and environment

To Reproduce Steps to reproduce the behavior:

  1. cd build_jaxlib
  2. python build/build.py --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa
  3. See error: Failed to build alpa-jax without GPU
    
    Bazel binary path: ./bazel-5.1.1-linux-x86_64
    Bazel version: 5.1.1
    Python binary path: /usr/bin/python
    Python version: 3.7
    NumPy version: 1.21.6
    MKL-DNN enabled: yes
    Target CPU: x86_64
    Target CPU features: release
    CUDA enabled: no
    TPU enabled: no
    Remote TPU enabled: no
    ROCm enabled: no
    Plugin device enabled: no

Building XLA and installing it in the jaxlib source tree... ./bazel-5.1.1-linux-x86_64 run --verbose_failures=true :build_wheel -- --output_path=/home/stack/softwares/alpa/build_jaxlib/dist --cpu=x86_64 --dev_install Extracting Bazel installation... Starting local Bazel server and connecting to it... INFO: Options provided by the client: Inherited 'common' options: --isatty=0 --terminal_columns=80 INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.jax_configure.bazelrc: Inherited 'build' options: --strategy=Genrule=standalone --repo_env PYTHON_BIN_PATH=/usr/bin/python --action_env=PYENV_ROOT --python_path=/usr/bin/python --distinct_host_configuration=false --override_repository=org_tensorflow=/home/stack/softwares/alpa/build_jaxlib/../third_party/tensorflow-alpa --config=avx_posix --config=mkl_open_source_only INFO: Found applicable config definition build:short_logs in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:avx_posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-mavx --host_copt=-mavx INFO: Found applicable config definition build:mkl_open_source_only in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1 INFO: Found applicable config definition build:linux in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --config=posix --copt=-Wno-unknown-warning-option --copt=-Wno-stringop-truncation --copt=-Wno-array-parameter INFO: Found applicable config definition build:posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-fvisibility=hidden --copt=-Wno-sign-compare --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc: Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false ERROR: @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu :: Error loading option @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu: error loading package '': Every .bzl file must have a corresponding package, but '//third_party/ducc:workspace.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist. b'' Traceback (most recent call last): File "build/build.py", line 580, in main() File "build/build.py", line 575, in main shell(command) File "build/build.py", line 53, in shell output = subprocess.check_output(cmd) File "/usr/lib64/python3.7/subprocess.py", line 421, in check_output **kwargs).stdout File "/usr/lib64/python3.7/subprocess.py", line 522, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['./bazel-5.1.1-linux-x86_64', 'run', '--verbose_failures=true', ':build_wheel', '--', '--output_path=/home/stack/softwares/alpa/build_jaxlib/dist', '--cpu=x86_64', '--dev_install']' returned non-zero exit status 2.



**Screenshots**
If applicable, add screenshots to help explain your problem.

**Code snippet to reproduce the problem**

**Additional information**
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.
matthewygf commented 10 months ago

Seems to happen for CUDA build as well here.

...
CUDA enabled: yes
NCCL enabled: yes
...

EDIT: Just to answer myself here, it works for me when I made sure

  1. Jax is at the commit 41417ee or version 0.3.22.
  2. somehow during git clone, most of the symlinks don't work properly at build_jaxlib, so I had to recreate them, jax, jaxlib and third_party symlinks