arch4edu / arch4edu

Arch Linux Repository for Education
https://arch4edu.org
594 stars 42 forks source link

[Package Request] tensorflow-rocm #98

Closed acxz closed 3 years ago

acxz commented 4 years ago

I think tensorflow-rocm is now working. Or at least it gets to 14,000/19,000 of the build and I run out of memory (16GB)

Do you think we can try packaging it?

petronny commented 4 years ago

I think tensorflow-rocm is now working.

I think it's working too. But I need a few more hours to build it again. Confirmed. It's working. The peak usage of RAM is about 32.23 GiB with -j32.

And there are some minor issues:

  1. Are cuda and cudnn missing from depends or makedepends?
  2. Again the build server cannot build tensorflow-opt-rocm. But this time lilac cannot simply skip the build command because it's in multiple lines. Could you add a variable called _build_opt=1 at the top of PKGBUILD? Also you may want to add another one called _build_no_opt=1:
    #...
    _build_no_opt=1
    _build_opt=1
    #...
    pkgname=()
    [ -n "$_build_no_opt" ] && pkgname+=(tensorflow-rocm python-tensorflow-rocm)
    [ -n "$_build_opt" ] && pkgname+=(tensorflow-opt-rocm python-tensorflow-opt-rocm)
    #...
    build() {
    #...
    [ -n "$_build_no_opt" ] && bazel build
    #...
    [ -n "$_build_opt" ] && bazel build
    }
    #...

    Then lilac can just delete _build_opt=1 and the build for tensorflow-opt-rocm will be skipped.

acxz commented 4 years ago

@petronny thats great!

  1. I don't think cuda and cudnn are needed.
  2. I just pushed out an update that has your suggestions in 2.3.0-8
petronny commented 4 years ago
  1. But nvcc is called at https://github.com/rocm-arch/tensorflow-rocm/blob/master/PKGBUILD#L128
acxz commented 4 years ago

Ah yeah, I kept the cuda vars in there so that it would be easier to merge these changes into the official tensorflow package when the time comes. Since the build still proceeds would it fine I keep that in the PKGBUILD?

acxz commented 4 years ago

Sorry to ping you @petronny but wondering if anything is holding this PKGBUILD back?

petronny commented 4 years ago

There are some new errors after rocm updated to 3.7.

Latest build log:

==> Starting build()...
Building with rocm and without non-x86-64 optimizations
You have bazel 3.4.1- (@non-git) installed.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=ngraph         # Build with Intel nGraph support.
    --config=numa           # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
    --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws          # Disable AWS S3 filesystem support.
    --config=nogcp          # Disable GCP support.
    --config=nohdfs         # Disable HDFS support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc:
  'build' options: --apple_platform_type=macos --define framework_shared_object=true --define open_source_build=true --java_toolchain=//third_party/toolchains/java:tf_java_toolchain --host_java_toolchain=//third_party/toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --config=v2
INFO: Reading rc options for 'build' from /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python --action_env PYTHON_LIB_PATH=/usr/lib/python3.8/site-packages --python_path=/usr/bin/python --config=xla --config=rocm --action_env TF_CONFIGURE_IOS=0
INFO: Found applicable config definition build:v2 in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:xla in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --action_env=TF_ENABLE_XLA=1 --define=with_xla_support=true
INFO: Found applicable config definition build:rocm in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm=true --define=using_rocm_hipcc=true --action_env TF_NEED_ROCM=1
INFO: Found applicable config definition build:mkl in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=build_with_mkl_dnn_v1_only=true -c opt
INFO: Found applicable config definition build:linux in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Analyzing: 4 targets (2 packages loaded)
Analyzing: 4 targets (2 packages loaded, 0 targets configured)
Analyzing: 4 targets (64 packages loaded, 54 targets configured)
Analyzing: 4 targets (173 packages loaded, 5313 targets configured)
WARNING: /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:1749:11: in linkstatic attribute of cc_library rule //tensorflow/core:lib_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation
Analyzing: 4 targets (200 packages loaded, 15361 targets configured)
Analyzing: 4 targets (306 packages loaded, 19205 targets configured)
WARNING: /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:1774:11: in linkstatic attribute of cc_library rule //tensorflow/core:lib_headers_for_pybind: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'cc_library', the error might have been caused by the macro implementation
WARNING: /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/core/BUILD:2161:16: in linkstatic attribute of cc_library rule //tensorflow/core:framework_internal: setting 'linkstatic=1' is recommended if there are no object files. Since this rule was created by the macro 'tf_cuda_library', the error might have been caused by the macro implementation
WARNING: /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/python/BUILD:4662:11: in py_library rule //tensorflow/python:standard_ops: target '//tensorflow/python:standard_ops' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
WARNING: /build/tensorflow-rocm/src/tensorflow-2.3.0-rocm/tensorflow/python/BUILD:115:11: in py_library rule //tensorflow/python:no_contrib: target '//tensorflow/python:no_contrib' depends on deprecated target '//tensorflow/python/ops/distributions:distributions': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.distributions will not receive new features, and will be removed by early 2019. You should update all usage of `tf.distributions` to `tfp.distributions`.
Analyzing: 4 targets (387 packages loaded, 32261 targets configured)
INFO: Analyzed 4 targets (387 packages loaded, 32449 targets configured).
INFO: Found 4 targets...

[18 / 415] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[354 / 16,071] checking cached actions
INFO: Elapsed time: 53.179s, Critical Path: 29.23s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Thu Sep 3 06:43:49 AM CST 2020 : === Preparing sources in dir: /tmp/tmp.xeMsMcnguq
~/tensorflow-rocm/src/tensorflow-2.3.0-rocm ~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
~/tensorflow-rocm/src/tensorflow-2.3.0-rocm/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow ~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
/tmp/tmp.xeMsMcnguq/tensorflow/include ~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
~/tensorflow-rocm/src/tensorflow-2.3.0-rocm
Thu Sep 3 06:44:20 AM CST 2020 : === Building wheel
warning: no files found matching 'README'
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pyi' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dylib' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.csv' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*.proto' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
Thu Sep 3 06:46:12 AM CST 2020 : === Output wheel file is in: /build/tensorflow-rocm/src/tmprocm
Building with rocm and with non-x86-64 optimizations
You have bazel 3.4.1- (@non-git) installed.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=ngraph         # Build with Intel nGraph support.
    --config=numa           # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
    --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws          # Disable AWS S3 filesystem support.
    --config=nogcp          # Disable GCP support.
    --config=nohdfs         # Disable HDFS support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
/startdir/PKGBUILD: line 163: bazel-bin/tensorflow/tools/pip_package/build_pip_package: No such file or directory
==> ERROR: A failure occurred in build().
acxz commented 4 years ago

Yep can confirm I'll let you know when we get it working with 3.7.0, will prob take a while, since I have a busy semester now.

acxz commented 4 years ago

I believe I have patched it for rocm 3.7.0 and above with 2.3.1-2 but not sure if the above errors are resolved with that patch. In any case it is worth to give it another shot.

petronny commented 3 years ago

ed38c229631ac2a97ad3475faf06f83d33b2f181

acxz commented 3 years ago

Thanks so much @petronny! This was p much the last big thing for the rocm stack on Arch Linux!