alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Apache License 2.0
810 stars 160 forks source link

crosstool_wrapper_driver_is_not_gcc failed #949

Open JaheimLee opened 1 year ago

JaheimLee commented 1 year ago

Hi! I have a problem when compile pytorch_blade using pre+cu117. Here is the log:

DO TORCH_BLADE CI_BUILD
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pip in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (22.3.1)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: virtualenv in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (20.17.1)
Requirement already satisfied: filelock<4,>=3.4.1 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from virtualenv) (3.9.0)
Requirement already satisfied: distlib<1,>=0.3.6 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from virtualenv) (0.3.6)
Requirement already satisfied: platformdirs<3,>=2.4 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from virtualenv) (2.6.2)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://download.pytorch.org/whl/nightly/cu117
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Requirement already satisfied: wget in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 1)) (3.2)
Requirement already satisfied: pytest==5.0.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (5.0.0)
Requirement already satisfied: networkx in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 3)) (3.0)
Requirement already satisfied: onnx==1.12.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 4)) (1.12.0)
Requirement already satisfied: hypothesis in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 5)) (6.62.0)
Requirement already satisfied: expecttest in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 6)) (0.1.4)
Requirement already satisfied: py-cpuinfo in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 7)) (9.0.0)
Requirement already satisfied: aliyun-log-python-sdk==0.6.48.6 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (0.6.48.6)
Requirement already satisfied: cryptography in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 9)) (39.0.0)
Requirement already satisfied: torch==2.0.0.dev20230101+cu117 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (2.0.0.dev20230101+cu117)
Requirement already satisfied: torchvision in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from -r scripts/pip/requirements-dev-pre+cu117.txt (line 12)) (0.15.0.dev20230108+cu117)
Requirement already satisfied: importlib-metadata>=0.12 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (6.0.0)
Requirement already satisfied: pluggy<1.0,>=0.12 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (0.13.1)
Requirement already satisfied: packaging in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (23.0)
Requirement already satisfied: wcwidth in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (0.2.5)
Requirement already satisfied: atomicwrites>=1.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (1.4.1)
Requirement already satisfied: more-itertools>=4.0.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (9.0.0)
Requirement already satisfied: py>=1.5.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (1.11.0)
Requirement already satisfied: attrs>=17.4.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (22.2.0)
Requirement already satisfied: numpy>=1.16.6 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 4)) (1.24.1)
Requirement already satisfied: typing-extensions>=3.6.2.1 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 4)) (4.4.0)
Requirement already satisfied: protobuf<=3.20.1,>=3.12.2 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 4)) (3.20.1)
Requirement already satisfied: jmespath in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (1.0.1)
Requirement already satisfied: dateparser in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (1.1.5)
Requirement already satisfied: six in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (1.16.0)
Requirement already satisfied: python-dateutil in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2.8.2)
Requirement already satisfied: requests in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2.28.1)
Requirement already satisfied: elasticsearch in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (8.5.3)
Requirement already satisfied: sympy in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from torch==2.0.0.dev20230101+cu117->-r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (1.11.1)
Requirement already satisfied: pytorch-triton==2.0.0+0d7e753227 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from torch==2.0.0.dev20230101+cu117->-r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (2.0.0+0d7e753227)
Requirement already satisfied: cmake in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytorch-triton==2.0.0+0d7e753227->torch==2.0.0.dev20230101+cu117->-r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (3.25.0)
Requirement already satisfied: filelock in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytorch-triton==2.0.0+0d7e753227->torch==2.0.0.dev20230101+cu117->-r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (3.9.0)
Requirement already satisfied: exceptiongroup>=1.0.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from hypothesis->-r scripts/pip/requirements-dev-pre+cu117.txt (line 5)) (1.1.0)
Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from hypothesis->-r scripts/pip/requirements-dev-pre+cu117.txt (line 5)) (2.4.0)
Requirement already satisfied: cffi>=1.12 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from cryptography->-r scripts/pip/requirements-dev-pre+cu117.txt (line 9)) (1.15.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from torchvision->-r scripts/pip/requirements-dev-pre+cu117.txt (line 12)) (9.4.0)
Requirement already satisfied: pycparser in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from cffi>=1.12->cryptography->-r scripts/pip/requirements-dev-pre+cu117.txt (line 9)) (2.21)
Requirement already satisfied: zipp>=0.5 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from importlib-metadata>=0.12->pytest==5.0.0->-r scripts/pip/requirements-dev-pre+cu117.txt (line 2)) (3.11.0)
Requirement already satisfied: regex!=2019.02.19,!=2021.8.27 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2022.10.31)
Requirement already satisfied: pytz in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2022.7)
Requirement already satisfied: tzlocal in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (4.2)
Requirement already satisfied: elastic-transport<9,>=8 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from elasticsearch->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (8.4.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (1.26.13)
Requirement already satisfied: charset-normalizer<3,>=2 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2.1.1)
Requirement already satisfied: certifi>=2017.4.17 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2022.12.7)
Requirement already satisfied: idna<4,>=2.5 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (3.4)
Requirement already satisfied: mpmath>=0.19 in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from sympy->torch==2.0.0.dev20230101+cu117->-r scripts/pip/requirements-dev-pre+cu117.txt (line 11)) (1.2.1)
Requirement already satisfied: pytz-deprecation-shim in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from tzlocal->dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (0.1.0.post0)
Requirement already satisfied: tzdata in /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages (from pytz-deprecation-shim->tzlocal->dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-pre+cu117.txt (line 8)) (2022.7)
2023-01-10 11:22:38,022     INFO linking via tao_compiler/file_map ...
2023-01-10 11:22:38,022     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/decoupling && ln -s /data/lijinghui/BladeDISC/tao_compiler/decoupling /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/decoupling`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,029     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/mlir/disc && ln -s /data/lijinghui/BladeDISC/tao_compiler/mlir/disc /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/mlir/disc`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,036     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/mlir/xla/ral && ln -s /data/lijinghui/BladeDISC/tao_compiler/mlir/xla/ral /data/lijinghui/BladeDISC/tf_community/tensorflow/compiler/mlir/xla/ral`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,042     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tensorflow/../.bazelrc.user && ln -s /data/lijinghui/BladeDISC/tao_compiler/.bazelrc.user /data/lijinghui/BladeDISC/tf_community/tensorflow/../.bazelrc.user`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,049     INFO linking ./tao to tf_community/tao
2023-01-10 11:22:38,049     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tao && ln -s /data/lijinghui/BladeDISC/tao /data/lijinghui/BladeDISC/tf_community/tao`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,055     INFO linking PatineClient
2023-01-10 11:22:38,056     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tao/third_party/PatineClient && ln -s /data/lijinghui/BladeDISC/../platform_alibaba/third_party/PatineClient /data/lijinghui/BladeDISC/tf_community/tao/third_party/PatineClient`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,062     INFO linking blade_gemm
2023-01-10 11:22:38,062     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tao/blade_gemm && ln -s /data/lijinghui/BladeDISC/../platform_alibaba/blade_gemm /data/lijinghui/BladeDISC/tf_community/tao/blade_gemm`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,069     INFO linking blade_service_common
2023-01-10 11:22:38,069     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tf_community/tao/third_party/blade_service_common && ln -s /data/lijinghui/BladeDISC/../platform_alibaba/third_party/blade_service_common /data/lijinghui/BladeDISC/tf_community/tao/third_party/blade_service_common`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,076     INFO cleanup tao_compiler with XLA always...
2023-01-10 11:22:38,076     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tao/tao_bridge/tao_launch_op`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
2023-01-10 11:22:38,081     INFO Execute shell command: `rm -rf /data/lijinghui/BladeDISC/tao/tao_bridge/gpu`, cwd: /data/lijinghui/BladeDISC/pytorch_blade
__version__ = '0.2.0+2.0.0.dev20230101.cu117'
__serialization_version__ = '0.0.3'
debug = True
cuda = '11.7'
cuda_available = True
build_tensorrt = False
static_tensorrt = False
git_version = '22b0141bf244fd7599e4f0181b68bd3dcc3edc95'
torch_version = '2.0.0.dev20230101+cu117'
torch_git_version = 'ede810cc26df8ef75e6eb91a56c2842d925b1539'
GLIBCXX_USE_CXX11_ABI = False

running cpp_test
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Invocation ID: 57630f49-8e07-4d60-a6d7-bf5bd8be4de0
INFO: Reading 'startup' options from /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc: --host_jvm_args=-Djdk.http.auth.tunneling.disabledSchemes=
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'test' from /data/lijinghui/BladeDISC/tf_community/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'test' from /data/lijinghui/BladeDISC/tf_community/.bazelrc:
  Inherited 'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Reading rc options for 'test' from /data/lijinghui/BladeDISC/tao_compiler/.bazelrc.user:
  Inherited 'build' options: --config=nonccl --config=noaws --config=nogcp --config=nohdfs
INFO: Reading rc options for 'test' from /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc:
  Inherited 'build' options: --disk_cache=~/.cache --define is_torch_disc=true --experimental_ui_max_stdouterr_bytes=-1
INFO: Found applicable config definition build:short_logs in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:nonccl in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=no_nccl_support=true
INFO: Found applicable config definition build:noaws in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=no_aws_support=true
INFO: Found applicable config definition build:nogcp in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=no_gcp_support=true
INFO: Found applicable config definition build:nohdfs in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=no_hdfs_support=true
INFO: Found applicable config definition build:torch_debug in file /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc: --copt=-O0 --compilation_mode=dbg --strip=never
INFO: Found applicable config definition build:torch_enable_quantization in file /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc: --define enable_quantization=true --copt=-DTORCH_BLADE_BUILD_QUANTIZATION
INFO: Found applicable config definition build:torch_cxx11abi_0 in file /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc: --config=cxx11abi_0 --action_env IF_CXX11_ABI=0
INFO: Found applicable config definition build:cxx11abi_0 in file /data/lijinghui/BladeDISC/tao_compiler/.bazelrc.user: --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0
INFO: Found applicable config definition build:torch_cuda in file /data/lijinghui/BladeDISC/pytorch_blade/.bazelrc: --config=cuda --config=disc_cuda --define enable_cuda=true
INFO: Found applicable config definition build:cuda in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:disc_cuda in file /data/lijinghui/BladeDISC/tao_compiler/.bazelrc.user: --config=disc --config=cuda
INFO: Found applicable config definition build:disc in file /data/lijinghui/BladeDISC/tao_compiler/.bazelrc.user: --define framework_shared_object=false --experimental_multi_threaded_digest
INFO: Found applicable config definition build:cuda in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:linux in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-stringop-overflow --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Wswitch --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /data/lijinghui/BladeDISC/tf_community/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /home/lijinghui/.cache/bazel/_bazel_lijinghui/6ebda45b359ed1a4e41cc51f9799b1c6/external/org_tensorflow/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'pybind11_bazel' because it already exists.
DEBUG: /home/lijinghui/.cache/bazel/_bazel_lijinghui/6ebda45b359ed1a4e41cc51f9799b1c6/external/org_tensorflow/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'rules_python' because it already exists.
DEBUG: /home/lijinghui/.cache/bazel/_bazel_lijinghui/6ebda45b359ed1a4e41cc51f9799b1c6/external/org_tensorflow/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'pybind11' because it already exists.
Loading: 
Loading: 0 packages loaded
Analyzing: 40 targets (0 packages loaded, 0 targets configured)
INFO: Analyzed 40 targets (0 packages loaded, 0 targets configured).
INFO: Found 5 targets and 35 test targets...
[0 / 4] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[1 / 81] Compiling lib/Dialect/mhlo/IR/hlo_ops.cc; 1s local, remote-cache ... (112 actions running)
[1 / 81] Compiling lib/Dialect/mhlo/IR/hlo_ops.cc; 11s local, remote-cache ... (112 actions running)
[2 / 85] Compiling lib/Dialect/mhlo/IR/hlo_ops.cc; 16s local, remote-cache ... (111 actions, 110 running)
ERROR: /data/lijinghui/BladeDISC/pytorch_blade/pytorch_blade/ltc/disc_compiler/BUILD:51:8: Linking pytorch_blade/ltc/disc_compiler/ltc_disc_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-dbg/bin/pytorch_blade/ltc/disc_compiler/ltc_disc_test-2.params
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphDebugDotPrint@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphRetainUserObject@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaUserObjectCreate@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaStreamUpdateCaptureDependencies@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaStreamGetCaptureInfo_v2@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphInstantiateWithFlags@libcudart.so.11.0'
collect2: error: ld returned 1 exit status
[32 / 87] Compiling lib/Dialect/mhlo/IR/hlo_ops.cc; 19s local, remote-cache ... (68 actions running)
INFO: Elapsed time: 19.894s, Critical Path: 19.29s
INFO: 66 processes: 52 internal, 14 local.
FAILED: Build did NOT complete successfully
//pytorch_blade/common_utils:torch_blade_common_utils_test            NO STATUS
//pytorch_blade/compiler/jit:jit_test                                 NO STATUS
//tests/mhlo:autocast.mlir.test                                       NO STATUS
//tests/mhlo:custom_call.mlir.test                                    NO STATUS
//tests/mhlo:custom_fake_quant.mlir.test                              NO STATUS
//tests/mhlo:custom_quantize_dequantize.mlir.test                     NO STATUS
//tests/mhlo:dropout.mlir.test                                        NO STATUS
//tests/mhlo:elementwise.mlir.test                                    NO STATUS
//tests/mhlo:extract.mlir.test                                        NO STATUS
//tests/mhlo:loss.mlir.test                                           NO STATUS
//tests/mhlo:matmul.mlir.test                                         NO STATUS
//tests/mhlo:matmul_half.mlir.test                                    NO STATUS
//tests/mhlo:mem_ops.mlir.test                                        NO STATUS
//tests/mhlo:multi_outputs.mlir.test                                  NO STATUS
//tests/mhlo:ops.mlir.test                                            NO STATUS
//tests/mhlo:reduction.mlir.test                                      NO STATUS
//tests/mhlo:slices.mlir.test                                         NO STATUS
//tests/mhlo:softmax.mlir.test                                        NO STATUS
//tests/mhlo:tensor.mlir.test                                         NO STATUS
//tests/mhlo:unsqueeze_and_squeeze.mlir.test                          NO STATUS
//tests/mhlo:views.mlir.test                                          NO STATUS
//tests/torch-disc-pdll/tests:conv_relu.mlir.test                     NO STATUS
//tests/torch-disc-pdll/tests:dequant_qgemm_quant.mlir.test           NO STATUS
//tests/torch-disc-pdll/tests:fake_quant.mlir.test                    NO STATUS
//tests/torchscript:basics.graph.test                                 NO STATUS
//tests/torchscript:factory_like.graph.test                           NO STATUS
//tests/torchscript:gather_like.graph.test                            NO STATUS
//tests/torchscript:since_1_10.graph.test                             NO STATUS
//tests/torchscript:since_1_11.graph.test                             NO STATUS
//tests/torchscript:since_1_12.graph.test                             NO STATUS
//tests/torchscript:since_1_13.graph.test                             NO STATUS
//tests/torchscript:since_1_14.graph.test                             NO STATUS
//tests/torchscript:slice_like.graph.test                             NO STATUS
//tests/torchscript:view_likes.graph.test                             NO STATUS
//pytorch_blade/ltc/disc_compiler:ltc_disc_test                 FAILED TO BUILD

Executed 0 out of 35 tests: 1 fails to build and 34 were skipped.
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
  File "/data/lijinghui/BladeDISC/pytorch_blade/setup.py", line 151, in <module>
    setup(
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/data/lijinghui/BladeDISC/pytorch_blade/setup.py", line 107, in run
    self.cpp_run()
  File "/data/lijinghui/BladeDISC/pytorch_blade/setup.py", line 91, in cpp_run
    build.test()
  File "/data/lijinghui/BladeDISC/pytorch_blade/bazel_build.py", line 281, in test
    subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash")
  File "/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -e; set -o pipefail;  source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/data/miniconda3/envs/ljh_BladeDISC/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING=\"2.0.0.dev20230101+cu117\" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=0 --copt=-DTORCH_BLADE_CUDA_VERSION=11.7 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch --copt=-DPYBIND11_COMPILER_TYPE=\"_gcc\" --copt=-DPYBIND11_STDLIB=\"_libstdcpp\" --copt=-DPYBIND11_BUILD_ABI=\"_cxxabi1011\" --config=torch_debug --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda   //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 1.
qiuxiafei commented 1 year ago

As the following error message:

bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphDebugDotPrint@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphRetainUserObject@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaUserObjectCreate@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaStreamUpdateCaptureDependencies@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaStreamGetCaptureInfo_v2@libcudart.so.11.0'
bazel-out/k8-dbg/bin/_solib_local/_U@local_Uorg_Utorch_S_S_Clibtorch___Ulib/libtorch_cuda.so: undefined reference to `cudaGraphInstantiateWithFlags@libcudart.so.11.0'

It seems that you've cuda 11.0 while pytorch pre requires cu117.

JaheimLee commented 1 year ago

Yeah. I have multiple CUDA package. And I manually set CUDA_HOME to cuda-11.7 as shown above both in my .bashrc and your build_pytorch_blade.sh. Why it still uses cuda 11.0?

qiuxiafei commented 1 year ago

Sorry, I may miss something. Pytorch usually carries cuda libraries with it's wheel. You can ldd your libtorch_cuda.so and you'll see something like libcudart-e409450e.so.11.0, this shared library usually lays alone with libtorch_cuda.so at the same directory. And these missing symbols should be in it. You can double check your pytorch installation directory and also bazel clean --expunge to avoid bazel related issues.

JaheimLee commented 1 year ago

Here is the output

(base) lijinghui@idc-op-dev-gpu-001:/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib$ ldd libtorch_cuda.so
        linux-vdso.so.1 (0x00007fff38354000)
        libc10_cuda.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f0247b7e000)
        libcudart-e409450e.so.11.0 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libcudart-e409450e.so.11.0 (0x00007f020c7dc000)
        libnvToolsExt-847d78f2.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libnvToolsExt-847d78f2.so.1 (0x00007f020c5d1000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f020c3b2000)
        libc10.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10.so (0x00007f0247aad000)
        libtorch_cpu.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libtorch_cpu.so (0x00007f01f325f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f01f2ec1000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f01f2cbd000)
        libcublas.so.11 => /home/lijinghui/cuda/cuda-11.7/lib64/libcublas.so.11 (0x00007f01e9a5f000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f01e9857000)
        libcudnn.so.8 => /home/lijinghui/cuda/cuda-11.7/lib64/libcudnn.so.8 (0x00007f01e9631000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f01e92a8000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f01e9090000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f01e8c9f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0247a67000)
        libgomp-a34b3233.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libgomp-a34b3233.so.1 (0x00007f01e8a75000)
        libcublasLt.so.11 => /home/lijinghui/cuda/cuda-11.7/lib64/libcublasLt.so.11 (0x00007f01d4ad4000)

So what should I do next? For example, copy the libcudart.so.11.7.99 from cuda-11.7 directory to pytorch installation directory and rename it as libcudart-e409450e.so.11.0?

JaheimLee commented 1 year ago

Here is the output

(base) lijinghui@idc-op-dev-gpu-001:/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib$ ldd libtorch_cuda.so
        linux-vdso.so.1 (0x00007fff38354000)
        libc10_cuda.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f0247b7e000)
        libcudart-e409450e.so.11.0 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libcudart-e409450e.so.11.0 (0x00007f020c7dc000)
        libnvToolsExt-847d78f2.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libnvToolsExt-847d78f2.so.1 (0x00007f020c5d1000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f020c3b2000)
        libc10.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10.so (0x00007f0247aad000)
        libtorch_cpu.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libtorch_cpu.so (0x00007f01f325f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f01f2ec1000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f01f2cbd000)
        libcublas.so.11 => /home/lijinghui/cuda/cuda-11.7/lib64/libcublas.so.11 (0x00007f01e9a5f000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f01e9857000)
        libcudnn.so.8 => /home/lijinghui/cuda/cuda-11.7/lib64/libcudnn.so.8 (0x00007f01e9631000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f01e92a8000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f01e9090000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f01e8c9f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0247a67000)
        libgomp-a34b3233.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libgomp-a34b3233.so.1 (0x00007f01e8a75000)
        libcublasLt.so.11 => /home/lijinghui/cuda/cuda-11.7/lib64/libcublasLt.so.11 (0x00007f01d4ad4000)

So what should I do next? For example, copy the libcudart.so.11.7.99 from cuda-11.7 directory to pytorch installation directory and rename it as libcudart-e409450e.so.11.0?

It didn't work. Maybe I need build pytorch from source.

JaheimLee commented 1 year ago

I noticed this issue. And maybe it can't be solved now.