gangliao commented 7 years ago

@PaddlePaddle/paddle

这几天重新研究了一下bazel，发现了一些我们可能没有仔细考虑过的问题，以及切换成bazel的代价。

对比CMake与Bazel：

CMake:

优点：稳定可靠，灵活性强，语法通熟易懂，甚至不需要学习都能写(我是这种情况)。但我认为最大的优点是：CMake是目前主流的C++开源项目编译工具，社区庞大，几乎任何你想实现的功能，github都有搜到一堆项目可以参考。

缺点：过于灵活，自定义性强，源码不同模块之间的相互依赖关系表述不是特别清晰。

Bazel:

优点：Bazel定义了很多规则(rules) 来简化引用外部依赖，比如git repo, http archive rules等等，同时通过rules清晰的定义的不同模块之间的依赖关系。

支持分布式编译：其实从很大程度上，对于tensorflow和paddle来说，这并不是一个优点，因为即使是laptop，计算能力也很强，编译速度完全可以接受，我的Mac本地3分钟左右，即使是travis ci编译也只需要7分钟而已。

缺点：学习成本太高, 虽然大部分target通过内置的rules: cc_library, cc_binary, new_repository, new_http_archive就可以完成。

但是由于tensorflow和paddle这种框架都需要支持不同的操作系统，不同的编译器，不同的异构硬件，甚至各类分布式平台。这样的自定义非常复杂，bazel目前是通过增加crosstool文件和bzl文件，来进行支持的。通过查看bazel官方文档可以发现，目前bazel完全是被tensorflow推着走（最近bazel的更新，反而都是来自tensorflow的需求）。

即使是这样，tensorflow也没有做到只用bazel搞定一切，tensorflow还包含大量的cmake文件，甚至tensorflow on windows就是用cmake来做的。

个人觉得，目前bazel还不算太成熟，自定义的bzl也需要写很多 python 代码，而且可读性不强。

Build代码分析

CMake

cmake 代码分为两个部分：

cmake目录下的代码：包括Paddle的依赖库的自动查找，编译参数配置等等。以下是代码行数（大部分自动查找代码来自其他项目的参考）：

       8 ./ccache.cmake
      15 ./swig.cmake
      21 ./package.cmake
      23 ./FindGlog.cmake
      24 ./version.cmake
      30 ./FindPythonModule.cmake
      38 ./FindNumPy.cmake
      40 ./check_packages.cmake
      62 ./cpplint.cmake
      68 ./cudnn.cmake
      76 ./FindAVX.cmake
      76 ./rdma.cmake
     103 ./coveralls.cmake
     138 ./cblas.cmake
     147 ./FindSphinx.cmake
     191 ./flags.cmake
     202 ./util.cmake
     403 ./coverallsGcovJsons.cmake
     582 ./FindGflags.cmake
    2247 total

各源文件目录模块下的代码：包括源码生成动态或者静态库以及二进制等等。以下是代码行数：

   1 ./paddle/parameter/tests/CMakeLists.txt
   2 ./paddle/api/test/CMakeLists.txt
   9 ./paddle/scripts/CMakeLists.txt
  13 ./paddle/parameter/CMakeLists.txt
  17 ./paddle/utils/tests/CMakeLists.txt
  19 ./paddle/CMakeLists.txt
  23 ./paddle/utils/CMakeLists.txt
  25 ./python/paddle/trainer_config_helpers/tests/CMakeLists.txt
  26 ./paddle/pserver/test/CMakeLists.txt
  33 ./paddle/math/tests/CMakeLists.txt
  37 ./python/CMakeLists.txt
  39 ./paddle/math/CMakeLists.txt
  43 ./proto/CMakeLists.txt
  56 ./doc/CMakeLists.txt
  56 ./paddle/pserver/CMakeLists.txt
  58 ./paddle/trainer/CMakeLists.txt
  64 ./paddle/gserver/CMakeLists.txt
  68 ./paddle/api/CMakeLists.txt
  85 ./paddle/trainer/tests/CMakeLists.txt
  94 ./paddle/cuda/CMakeLists.txt
 122 ./paddle/gserver/tests/CMakeLists.txt
 144 ./warp-ctc/CMakeLists.txt
 166 ./CMakeLists.txt
1200 total

综合两部分：我们本身写了2000行左右就完成了PaddlePaddle的编译任务。

tensorflow

tensorflow首先通过一个configure文件导入各种环境变量给bazel使用。

之后主要通过workspace, BUILD和bzl 文件完成编译任务。

bzl 文件

       9 ./tensorflow/core/platform/default/build_config_root.bzl
      27 ./tensorflow/python/build_defs.bzl
      57 ./tensorflow/tools/test/performance.bzl
      59 ./tensorflow/core/platform/default/platform.bzl
     149 ./third_party/llvm/llvm.bzl
     187 ./tensorflow/core/platform/default/build_config.bzl
     199 ./third_party/sycl/sycl_configure.bzl
     337 ./tensorflow/workspace.bzl
     756 ./third_party/gpus/cuda_configure.bzl
     947 ./tensorflow/tensorflow.bzl
    2727 total

BUILD 文件

       0 ./third_party/gpus/BUILD
       0 ./third_party/gpus/crosstool/BUILD
       0 ./third_party/gpus/cuda/BUILD
       1 ./BUILD
       1 ./third_party/BUILD
       8 ./third_party/llvm/BUILD
      14 ./third_party/py/numpy/BUILD
      15 ./tensorflow/examples/android/assets/BUILD
      15 ./third_party/sycl/BUILD
      17 ./tensorflow/tools/docker/notebooks/BUILD
      18 ./tensorflow/contrib/tfprof/BUILD
      19 ./tensorflow/g3doc/tutorials/BUILD
      19 ./tensorflow/tensorboard/app/BUILD
      19 ./tensorflow/tensorboard/components/vz_projector/BUILD
      19 ./third_party/sycl/sycl/BUILD
      22 ./tensorflow/core/platform/default/gpu/BUILD
      22 ./tensorflow/go/BUILD
      22 ./tensorflow/tensorboard/lib/BUILD
      22 ./tensorflow/tools/ci_build/gpu_build/BUILD
      22 ./third_party/hadoop/BUILD
      23 ./tensorflow/contrib/factorization/examples/BUILD
      23 ./tensorflow/tensorboard/components/BUILD
      25 ./tensorflow/java/src/main/java/org/tensorflow/examples/BUILD
      27 ./tensorflow/tools/docker/BUILD
      28 ./tensorflow/tools/git/BUILD
      29 ./tensorflow/contrib/compiler/BUILD
      30 ./tensorflow/contrib/input_pipeline/kernels/BUILD
      30 ./tensorflow/contrib/testing/BUILD
      30 ./tensorflow/examples/tutorials/word2vec/BUILD
      32 ./tensorflow/examples/tutorials/estimators/BUILD
      32 ./third_party/eigen3/BUILD
      32 ./util/python/BUILD
      34 ./tensorflow/tensorboard/components/vz_data_summary/BUILD
      35 ./tensorflow/examples/label_image/BUILD
      35 ./tensorflow/tensorboard/scripts/BUILD
      36 ./tensorflow/contrib/session_bundle/example/BUILD
      36 ./tensorflow/examples/tutorials/monitors/BUILD
      38 ./tensorflow/contrib/integrate/BUILD
      38 ./tensorflow/contrib/stat_summarizer/BUILD
      39 ./tensorflow/contrib/deprecated/BUILD
      39 ./tensorflow/contrib/legacy_seq2seq/BUILD
      39 ./tensorflow/python/ops/losses/BUILD
      40 ./tensorflow/contrib/crf/BUILD
      40 ./tensorflow/contrib/grid_rnn/BUILD
      42 ./tensorflow/contrib/copy_graph/BUILD
      42 ./tensorflow/contrib/layers/kernels/BUILD
      42 ./tensorflow/contrib/lookup/BUILD
      42 ./tensorflow/contrib/losses/BUILD
      42 ./tensorflow/python/saved_model/example/BUILD
      45 ./tensorflow/examples/image_retraining/BUILD
      46 ./tensorflow/tools/dist_test/server/BUILD
      52 ./tensorflow/contrib/seq2seq/BUILD
      52 ./tensorflow/tools/tfprof/BUILD
      56 ./tensorflow/tools/docs/BUILD
      59 ./tensorflow/tensorboard/lib/python/BUILD
      60 ./tensorflow/contrib/quantization/BUILD
      61 ./tensorflow/contrib/specs/BUILD
      64 ./tensorflow/user_ops/BUILD
      65 ./tensorflow/stream_executor/BUILD
      67 ./tensorflow/contrib/factorization/kernels/BUILD
      67 ./tensorflow/contrib/opt/BUILD
      67 ./tensorflow/tools/quantization/BUILD
      68 ./tensorflow/core/ops/compat/BUILD
      68 ./tensorflow/examples/how_tos/reading_data/BUILD
      68 ./tensorflow/java/src/main/native/BUILD
      69 ./tensorflow/contrib/ndlstm/BUILD
      73 ./tensorflow/contrib/ffmpeg/default/BUILD
      73 ./tensorflow/contrib/image/BUILD
      73 ./tensorflow/contrib/learn/python/learn/datasets/BUILD
      75 ./tensorflow/core/platform/hadoop/BUILD
      75 ./tensorflow/core/util/tensor_bundle/BUILD
      75 ./tensorflow/tensorboard/backend/BUILD
      76 ./tensorflow/tensorboard/bower/BUILD
      77 ./tensorflow/contrib/metrics/BUILD
      79 ./tensorflow/core/kernels/cloud/BUILD
      82 ./tensorflow/contrib/util/BUILD
      84 ./tensorflow/tensorboard/BUILD
      86 ./tensorflow/contrib/solvers/BUILD
      87 ./tensorflow/cc/saved_model/BUILD
      88 ./tensorflow/contrib/linear_optimizer/BUILD
      89 ./tensorflow/contrib/tfprof/python/tools/tfprof/BUILD
      89 ./tensorflow/tools/benchmark/BUILD
      93 ./tensorflow/contrib/BUILD
      95 ./tensorflow/contrib/android/BUILD
      95 ./tensorflow/core/kernels/hexagon/BUILD
      96 ./tensorflow/contrib/tensorboard/BUILD
     101 ./tensorflow/core/util/ctc/BUILD
     103 ./tensorflow/tools/proto_text/BUILD
     104 ./tensorflow/tools/test/BUILD
     105 ./tensorflow/c/BUILD
     107 ./tensorflow/contrib/input_pipeline/BUILD
     108 ./tensorflow/contrib/linalg/BUILD
     110 ./tensorflow/contrib/slim/BUILD
     116 ./tensorflow/java/BUILD
     117 ./tensorflow/contrib/cudnn_rnn/BUILD
     117 ./tensorflow/tools/graph_transforms/BUILD
     119 ./tensorflow/contrib/graph_editor/BUILD
     127 ./tensorflow/examples/tutorials/mnist/BUILD
     127 ./tensorflow/tools/pip_package/BUILD
     136 ./tensorflow/contrib/framework/BUILD
     136 ./tensorflow/examples/android/BUILD
     137 ./tensorflow/contrib/ffmpeg/BUILD
     138 ./tensorflow/contrib/bayesflow/BUILD
     152 ./tensorflow/g3doc/how_tos/adding_an_op/BUILD
     153 ./tensorflow/contrib/slim/python/slim/data/BUILD
     156 ./tensorflow/contrib/factorization/BUILD
     158 ./tensorflow/python/saved_model/BUILD
     158 ./tensorflow/python/tools/BUILD
     167 ./tensorflow/contrib/labeled_tensor/BUILD
     188 ./tensorflow/contrib/training/BUILD
     188 ./tensorflow/core/platform/default/build_config/BUILD
     191 ./tensorflow/contrib/slim/python/slim/nets/BUILD
     206 ./tensorflow/examples/learn/BUILD
     227 ./tensorflow/core/platform/cloud/BUILD
     240 ./tensorflow/contrib/rnn/BUILD
     243 ./tensorflow/BUILD
     249 ./tensorflow/core/debug/BUILD
     251 ./tensorflow/tools/tfprof/internal/BUILD
     295 ./tensorflow/contrib/layers/BUILD
     334 ./tensorflow/contrib/tensor_forest/BUILD
     343 ./tensorflow/contrib/tensor_forest/hybrid/BUILD
     346 ./tensorflow/core/distributed_runtime/BUILD
     358 ./tensorflow/contrib/session_bundle/BUILD
     412 ./tensorflow/contrib/distributions/BUILD
     421 ./tensorflow/python/debug/BUILD
     452 ./tensorflow/core/distributed_runtime/rpc/BUILD
     524 ./tensorflow/cc/BUILD
     754 ./tensorflow/contrib/learn/BUILD
    1419 ./tensorflow/python/kernel_tests/BUILD
    2390 ./tensorflow/core/BUILD
    2586 ./tensorflow/python/BUILD
    4031 ./tensorflow/core/kernels/BUILD
   23569 total

cmake 文件

15 ./tensorflow/contrib/cmake/external/gemmlowp.cmake
  15 ./tensorflow/tools/ci_build/Dockerfile.cmake
  17 ./tensorflow/contrib/cmake/tf_core_direct_session.cmake
  21 ./tensorflow/contrib/cmake/tf_models.cmake
  22 ./tensorflow/contrib/cmake/tf_label_image_example.cmake
  22 ./tensorflow/contrib/cmake/tf_tutorials.cmake
  24 ./tensorflow/contrib/cmake/tf_tools.cmake
  29 ./tensorflow/contrib/cmake/external/googletest.cmake
  32 ./tensorflow/contrib/cmake/external/boringssl.cmake
  35 ./tensorflow/contrib/cmake/external/jsoncpp.cmake
  35 ./tensorflow/contrib/cmake/external/protobuf.cmake
  36 ./tensorflow/contrib/cmake/external/grpc.cmake
  39 ./tensorflow/contrib/cmake/external/eigen.cmake
  39 ./tensorflow/contrib/cmake/external/highwayhash.cmake
  46 ./tensorflow/contrib/cmake/external/png.cmake
  46 ./tensorflow/contrib/cmake/external/zlib.cmake
  47 ./tensorflow/contrib/cmake/tf_core_cpu.cmake
  48 ./tensorflow/contrib/cmake/tf_core_distributed_runtime.cmake
  57 ./tensorflow/contrib/cmake/external/farmhash.cmake
  71 ./tensorflow/contrib/cmake/external/gif.cmake
  73 ./tensorflow/contrib/cmake/tf_stream_executor.cmake
  80 ./tensorflow/contrib/cmake/external/jpeg.cmake
  88 ./tensorflow/contrib/cmake/tf_core_ops.cmake
 119 ./tensorflow/contrib/cmake/tf_cc_ops.cmake
 120 ./tensorflow/contrib/cmake/tf_core_kernels.cmake
 134 ./tensorflow/contrib/cmake/external/tensorboard.cmake
 223 ./tensorflow/contrib/cmake/tf_core_framework.cmake
 366 ./tensorflow/contrib/cmake/tf_tests.cmake
 654 ./tensorflow/contrib/cmake/tf_python.cmake
   2 ./tensorflow/contrib/cmake/patches/gemmlowp/CMakeLists.txt
  25 ./tensorflow/contrib/cmake/patches/farmhash/CMakeLists.txt
  33 ./tensorflow/contrib/cmake/patches/gif/CMakeLists.txt
  54 ./tensorflow/contrib/cmake/patches/highwayhash/CMakeLists.txt
  61 ./tensorflow/contrib/android/cmake/CMakeLists.txt
  76 ./tensorflow/contrib/cmake/patches/jpeg/CMakeLists.txt
 218 ./tensorflow/contrib/cmake/CMakeLists.txt
 323 ./tensorflow/contrib/cmake/patches/grpc/CMakeLists.txt
 3345 total

综合几个部分：Tensorflow有接近30000行相关的编译代码。

网上一些人的意见

TensorFlow主要作者的意见


We originally created TensorFlow as an project at Google used by other internal teams, 
and Google uses a monolithic repository, in which all projects and all of their dependencies 
are built using the system on which Bazel is based. As a result, we were able to take the existing 
BUILD files from our internal version, and—with a small amount of automatic modification—turn 
them into BUILD files that Bazel understands. We also enjoy a good relationship with the Bazel 
team, who have been a great help in getting TensorFlow to build on more platforms.

As you’ve noticed, there are some platforms that Bazel doesn’t currently serve. For example, my colleague Pete Warden has written Makefiles that help to cross-compile TensorFlow for iOS. Aurélien Géron submitted CMake configuration files, and I’m currently adapting these to build TensorFlow on Windows. These builds could be a starting point for platforms that Bazel doesn’t support.

In the longer term, though, we’d prefer to consolidate these into a single cross-platform build, and the Bazel team are actively adding more features (such as Windows support) that should enable this soon.



2. 用户的感受

https://www.quora.com/Should-I-use-the-Google-Bazel-Blaze-build-system

## 个人思考

以上是我个人对bazel信息的收集和学习中的思考，希望大家可以一起讨论学习。我自身很希望学习使用bazel，在深入的过程中，发现还是有很多问题的。

gangliao commented 7 years ago

CUDA using CMake and Bazel

CUDA 通过CMake

首先检查系统或者用户自行选择是否支持gpu编译
为nvcc设置 CUDA_NVCC_FLAGS
通过CMake内置函数 cuda_library(...) 编译生成动态库或者静态库

CUDA 通过Bazel

首先通过configure文件，与用户交互，设置一系列环境变量包括cuda， cudnn等等。
通过third_party/gpus下面的cuda_configure来对环境变量值进行判断和变量赋值。
通过增加和修改 third_party/gpus 下面的crosstool文件来支持cuda的编译。
通过使用自定义的rule接口来编译cuda源码。

gangliao commented 7 years ago

CMake 源码级依赖

https://cmake.org/cmake/help/v3.0/module/ExternalProject.html

Paddle的依赖glog, glags, gtest, zlib等等也完全可以这样(已经在Mac和ubuntu上面测试过)： https://github.com/gangliao/CodeCoverageCMake/blob/master/cmake/third_party.cmake

举个例子：

    ExternalProject_Add(gflags
      PREFIX ${gflags_PREFIX}
      GIT_REPOSITORY "https://github.com/gflags/gflags.git"
      GIT_TAG "v2.1.2"
      UPDATE_COMMAND ""
      INSTALL_DIR ${gflags_INSTALL}
      CMAKE_ARGS -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE}
                 -DCMAKE_INSTALL_PREFIX=${gflags_INSTALL}
                 -DBUILD_SHARED_LIBS=OFF
                 -DBUILD_STATIC_LIBS=ON
                 -DBUILD_PACKAGING=OFF
                 -DBUILD_TESTING=OFF
                 -DBUILD_NC_TESTS=OFF
                 -BUILD_CONFIG_TESTS=OFF
                 -DINSTALL_HEADERS=ON
                 -DCMAKE_C_FLAGS=${GFLAGS_C_FLAGS}
                 -DCMAKE_CXX_FLAGS=${GFLAGS_CXX_FLAGS}
      LOG_DOWNLOAD 1
      LOG_INSTALL 1
      )

wangkuiyi commented 7 years ago

Many thanks to @gangliao 's exhaustive information collection and comparison between cmake and Bazel.

It looks to me that even if we base Paddle's build rules on top of all those works done by Tensorflow and Bazel team, we would still have a complex configuration and building system.

Given that the original motivation for us to consider Bazel as a candidate is that we want fine-grained building rules for Paddle source code, where fine-grain means that we want to build each pair of xxx.h and xxx.cc in a rule (of a library), and the corresponding xxxx_test.cc in another rule (of a unit test). Can we do this using cmake?

-- It seems that we can. Could @gangliao please have some examples on this, so we could confidently abandon Bazel and continue with cmake? Thanks!

pzz2011 commented 7 years ago

我又一个关于bazel的问题: 当bazel依赖third party library时如何定义规则从internet下载library? 似乎要指定下载地址以及sha256...... 但是这个信息怎么获得呢? 我本来以为会有类似maven一样的mirror可以获取这些信息,之后copy过来就可以了,然而发现似乎没有.


  native.new_http_archive(
      name = "highwayhash",
      urls = [   ====>本来以为bazel-mirror.storage.googleapis.com是mirror, 结果发现...似乎没法访问
          "http://bazel-mirror.storage.googleapis.com/github.com/google/highwayhash/archive/dfcb97ca4fe9277bf9dc1802dd979b071896453b.tar.gz",
          "https://github.com/google/highwayhash/archive/dfcb97ca4fe9277bf9dc1802dd979b071896453b.tar.gz",
      ],
      sha256 = "0f30a15b1566d93f146c8d149878a06e91d9bb7ec2cfd76906df62a82be4aac9",
      strip_prefix = "highwayhash-dfcb97ca4fe9277bf9dc1802dd979b071896453b",
      build_file = str(Label("//third_party:highwayhash.BUILD")),
  )

不知道你有什么想法呢?

gangliao commented 7 years ago

都是从第三依赖库的release或者tag中找到。。

pzz2011 commented 7 years ago

what about sha256 = "0f30a15b1566d93f146c8d149878a06e91d9bb7ec2cfd76906df62a82be4aac9"

reyoung commented 7 years ago

Paddle 选择使用cmake作为编译工程脚本，因为cmake也支持下载依赖编译。这个issue先close掉了。

gangliao commented 6 years ago

I have a different concern about bazel. Recently, I build tensorflow via bazel, it's extremely fast. I change my mind about bazel.

I found many people consult me about bazel vs cmake after they retrieved this issue. Hopeful, this post can help these guys make technical decision. I don't want to bias adjustment.

Tensor Flow Total build time:

INFO: Elapsed time: 200.266s, Critical Path: 189.28s

luotao1 commented 6 years ago

@gangliao 请问你测tf的机器型号是什么呢？也是编译GPU版本么？paddle目前在teamcity机器上的编译差不多要13分钟。

gangliao commented 6 years ago

@luotao1 是的，我编译tf with MPI + MKL + GPU + LLVM, etc...

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2200.056
BogoMIPS:              4405.65
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.09                 Driver Version: 381.09                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 0000:04:00.0     Off |                  N/A |
| 23%   28C    P8     8W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 0000:06:00.0     Off |                  N/A |
| 23%   30C    P8     8W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 0000:07:00.0     Off |                  N/A |
| 23%   27C    P8     9W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 0000:08:00.0     Off |                  N/A |
| 23%   30C    P8     9W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN Xp            Off  | 0000:0C:00.0     Off |                  N/A |
| 23%   24C    P8     8W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN Xp            Off  | 0000:0D:00.0     Off |                  N/A |
| 23%   24C    P8     8W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN Xp            Off  | 0000:0E:00.0     Off |                  N/A |
| 23%   26C    P8     9W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN Xp            Off  | 0000:0F:00.0     Off |                  N/A |
| 23%   29C    P8     9W / 250W |      0MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

typhoonzero commented 6 years ago

@gangliao @luotao1 编译速度也取决于机器有多少物理的CPU core， @gangliao 的机器有40 CPU core呢，teamcity的机器应该都是是12core。计算能力约是3.3倍，如果这么算，paddle在40 core的机器上理论上应该也有4 min可以编译完成的速度。

gangliao commented 6 years ago

embarrassingly parallel的情况下，应该是这样子

luotao1 commented 6 years ago

paddle在40 core的机器上理论上应该也有4 min可以编译完成的速度

paddle目前应该做不到。

helinwang commented 6 years ago

@gangliao 请问bazel跟cmake在编译速度上有本质的区别吗？paddle最近的CI越来越慢了，我在想如果完全改成bazel，假设代码模块之间依赖关系不变的情况下，会不会有非常可观的速度提升？

jacquesqiao commented 6 years ago

we use cmake now

PaddlePaddle / Paddle

CMake VS Bazel #932

对比CMake与Bazel：

Build代码分析

CMake

tensorflow

网上一些人的意见

CUDA using CMake and Bazel

CUDA 通过CMake

CUDA 通过Bazel

CMake 源码级依赖