Open qiuxiafei opened 2 years ago
As for one of our concerns, which we may need to change the current visibility of bazel targets, I did some research for RAL part.
As we can find in RAL's BUILD, these targets are needed(test dependency like //tensorflow/core:test_main
are excluded).
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:stream_executor_headers_lib",
"//tensorflow/stream_executor",
"//tensorflow/stream_executor/cuda:cuda_platform",
"//tensorflow/stream_executor:cuda_platform",
"//tensorflow/stream_executor:rocm_platform",
"//tensorflow/stream_executor/rocm:rocm_driver",
"//tensorflow/stream_executor/rocm:rocm_platform",
As in the current tf community’s code, only //tensorflow/stream_executor/rocm:rocm_driver
is not public visible. However //tensorflow/stream_executor:rocm_platform
depends on it, and is a public target we already depend on, thus we can replace depend on //tensorflow/stream_executor/rocm:rocm_driver
with //tensorflow/stream_executor:rocm_platform
.
As we can see from tf_community/tensorflow/stream_executor/rocm/BUILD
cc_library(
name = "rocm_platform",
srcs = if_rocm_is_configured(["rocm_platform.cc"]),
hdrs = if_rocm_is_configured(["rocm_platform.h"]),
visibility = ["//visibility:public"],
deps = if_rocm_is_configured([
":rocm_driver",
As for tao_compiler part, most of the dependencys are from tensorflow/compiler/mlir/hlo
dir, all the targets from this dir are public.
However, as for targets under tensorflow/compiler/xla
, the targets are visible for friends
. This is the part where we maybe need to change the visibility. As for now, adding visibility change patch is necessary, or we can search for a public target containing the visibility: friends
targets.
As we have done in build tao bridge with bazel for cu110 device #231
, we can successfully build tao_bridge for tensorflow-gpu
versions.
However for cpu
or even arm-cpu
, aka aarch64
, we have encountered a problem comes with mkldnn
and acl
.
Now we will download and build mkldnn
and acl
related in common_setup.py
:
https://github.com/alibaba/BladeDISC/blob/d5f085b099a7cfa3dbcdd83d125fcd6c211d69ec/scripts/python/common_setup.py#L257-L313
After this, the built mkldnn
and acl
will be used by tao_bridge
and tao_compiler
with CMake
and bazel
.
When used in tao_compiler
, the tao
dir is linked under tf_community
.
When trying to support bazel build for mkldnn
, the newly added mkldnn
bazel rules under third_party/bazel/mkldnn
cannot be used in tf_community
dir without patch code in tf_community
. So for now, if we only support bazel build for tao_bridge
part, the download and compile for mkldnn
will not be deleted. However when tao_compiler
become a single bazel workspace, we can use our own mkldnn
bazel rules without doing extra build actions in common_setup.py
.
We have the following actions to follow:
tao_bridge
(for x86
)tao_bridge
(for aarch64
)tao_compiler
a single bazel workspacetao_compiler
build depend on our mkldnn/acl bazel rules1 && 2 are onging actions.
Update:
In sprint2204, we have complete the work of build tao_bridge with bazel
for internal version and opensource version build. And the internal also uses open-source tao_build.py
now.
As for now, all of disc
's target can be build by bazel
. The remaining works to be done are as follows:
tao_bridge
build by bazel..bazelrc
in tao_compiler
tao
pytorch_blade
tensorflow_blade
and share scripts logics as much as possible.tao_compiler
binary inside tao_compiler
workspace instead of org_tensorflow
workspace.ral
dependencies built by bazel
, do not use CMake
in tao_build.py
anymore.The last 2 items should be a little bit long-term work items since current bazel build from multiple workspaces works fine for now. However our final goal is still make the entire repo build in one large workspace.
Tasks:
update 2022-06-27:
tao_compiler_main
from tensorflow_blade, see: https://github.com/alibaba/BladeDISC/issues/420
Background
When putting
pytorch_blade
andtensorflow_blade
to open source, BladeDISC's project structure gets more complex. Currently we've already had the following essential directories:tao
: TF bridge of BladeDISCtao_compiler
: BladeDISC compiler main executable, which will be symbolically linked to a directory undertf_community
and built with tensorflow.tf_community
: mirroring tensorflow/tensorflow.mhlo_builder
: PyTorch bridge of BladeDISC, converting TorchScript to MHLO IR.pytorch_blade
: Python API to optimize PyTorch model.tensorflow_blade
: Python API to optimize TF model.Our Goal and Current Status
As we've discussed for a long time and many times, we're moving to MonoRepo with both open source and internal code repository, and use Bazel to build ALL. Making all in this repo able to build with Bazel would make dependency structure explicit, standard and clean, and help new developers ramp-up smoothly. In ideal status, one could run a universal preparation script once with necessary arguments, and then
bazel build
orbazel test
any target from any component.But there're some obstacles in the way:
tao
is build with CMake, whiletao_compiler
is with Bazel. It worth nothing that, RAL code is built by both side.pytorch_blade
is build Bazel wrapped by python Setuptools, whiletensorflow_blade
has Bazel calling setuptools.tao_compiler
and RAL files totf_community
directory.Approaches
1. Build
tao
with Bazel.The
tao
directory is currently built with CMake. Converting CMake to Bazel is non-trivial but still possible. But for code of RAL, which is build both on bridge side and compiler side makes things complex. RAL code is build with CMake on bridge side and with Bazel on compiler side undertf_community
directory. The BUILD file of RAL code loadtf_community
's rules which won't be available on bridge side. Because bridge just has include files and shared libraries of give host tensorflow. https://github.com/alibaba/BladeDISC/blob/a0d60f9f258052c13f1e45f365e451075a7db937/tao_compiler/mlir/xla/ral/BUILD#L3-L5 There maybe several soluctions:BUILD
file neutral and just load standard Bazel rules, so that it could be used for both bridge and compiler side. Same source files can be compiled into different target for each side. It's also possible to setup an option specifying which side is being building, and useselect
to switch between dependencies fromtf_community
and host tensorflow.filegroup
target from RAL directory, each side writecc_library
target inBUILD
file under their own directory.2. Extend
common_setup.py
If
tao
is built with Bazel, all DISC components could expose Bazel targets (may or may not be in single workspace)! Upper-level components likepytorch_blade
andtf_blade
could reference those targets and move on there own building.common_setup.py
is used to do preparations before build symbolic linking and OneDNN installation before building DISC. So when building any component that depends on DISC,common_setup.py
should be called in advance: https://github.com/alibaba/BladeDISC/blob/70ecc07449ade0450fbd0f0f58494c38e683c5b3/pytorch_blade/ci_build/build_pytorch_blade.sh#L45 If we extendcommon_setup.py
a little bit, setting environment variables inbuild_pytorch_blade.py
, pytorch_blade will be free from extra build script( as for the relationship of Python setup tools and Bazel, see open questions). If so, why not just makecommon_setup.py
a global setup step for this whole project, like the configure script in tensorflow.3. Make DISC a Bazel Workspace out of tf_community
We've had pretty many Bazel workspace now, from an achitecture view, it's natural to have a single Bazel workspace for all of
tao_compiler
/tao
/mhlo
, which make up DISC. Pullingtao_compiler
out oftf_community
's workspace is the key to achieve this goal. I have to admit that it not a very urgent task and we may have challenges if manytf_community
internal targets are referenced bytao_compiler
. IREE has similar works, may that help.These are just immature thoughts of my own, your comments pls ~
Open Questions
pytorch_blade
is build Bazel wrapped by python Setuptools, whiletensorflow_blade
has Bazel calling setuptools.