Wind-River / meta-tensorflow

Other
13 stars 18 forks source link

tensorflow: build for genericx86-64 target fails #1

Closed mhaldrich closed 5 years ago

mhaldrich commented 5 years ago

Issue

Building MACHINE=genericx86-64 bitbake tensorflow fails to build (eg, wrong architecture: armeabi (fails during eigen). Note: I am using a patched recipe to build using sumo toolchain

Any pointers on extending the file BUILD or CROSSTOOL files are appreciated.

hongxu-jia commented 5 years ago

Could you please build with poky + meta-openembbed on master branch, here is my build steps

date=`date +%Y%m%d%H`
mkdir build_ts_poky_$date
cd build_ts_poky_$date

topdir=`pwd`
git clone --branch master --single-branch https://github.com/Wind-River/meta-tensorflow.git
git clone --branch master --single-branch git://git.openembedded.org/meta-openembedded
git clone --branch master --single-branch git://git.pokylinux.org/poky.git

machine="genericx86-64"
date=`date +%Y%m%d`
. $topdir/poky/oe-init-build-env build_$machine

cat <<endof_local>>conf/local.conf
MACHINE = "$machine"
PACKAGE_CLASSES = "package_rpm"
VIRTUAL-RUNTIME_init_manager = "systemd"
DISTRO_FEATURES_append = " systemd"
DISTRO_FEATURES_BACKFILL_CONSIDERED_append = " sysvinit"

IMAGE_INSTALL_append = " tensorflow"
IMAGE_FEATURES += "ssh-server-openssh"
IMAGE_INSTALL_append = " tensorflow dhcp-client"
endof_local

cat <<endof_bblayer>>conf/bblayers.conf
BBLAYERS += " \\
$topdir/meta-openembedded/meta-python \\
$topdir/meta-openembedded/meta-oe \\
$topdir/meta-tensorflow \\
"
endof_bblayer

bitbake core-image-minimal
hongxu-jia commented 5 years ago

I need more details on your failure, log? steps?

mhaldrich commented 5 years ago

@hongxu-jia -- thanks for the reply! I will checkout master, and report back. I'll post logs, etc. soon. Thanks again for your support.

Update

My fork is here commit https://github.com/mhaldrich/meta-tensorflow/commit/ef4e92cd97ba35b602c65c3e98bffe3d620e58ef

Build against master fails with:

MACHINE=genericx86-64 bitake tensorflow

My local.conf is similar to yours, and thank you for providing build steps.

Logs:

| Execution platform: @bazel_tools//platforms:host_platform
| In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:124:0,
|                  from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
|                  from ./tensorflow/core/kernels/reduction_ops_common.h:27,
|                  from tensorflow/core/kernels/reduction_ops_sum.cc:16:
| external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h: In static member function 'static void std::_Function_handler<void(_ArgTypes ...), _Functor>::_M_invoke(const std::_Any_data&
, _ArgTypes&& ...) [with _Functor = Eigen::internal::TensorExecutor<Expression, Eigen::ThreadPoolDevice, Vectorizable, Tileable>::run(const Expression&, const Eigen::ThreadPoolDevice&) [with Expression =
const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >,
const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >; bool Vectorizable = true; bo
ol Tileable = false]::<lambda(Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorR
eductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<const std::complex<float>, 1, 1, long int>, 16, Eigen::Mak
ePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex, Eigen::internal::TensorExecutor<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<std::complex<float>, 0, 1,
 long int>, 16, Eigen::MakePointer>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<std::complex<float> >, const Eigen::IndexList<Eigen::type2index<0> >, const Eigen::TensorMap<Eigen::Tensor<c
onst std::complex<float>, 1, 1, long int>, 16, Eigen::MakePointer>, Eigen::MakePointer> >, Eigen::ThreadPoolDevice, true, false>::StorageIndex)>; _ArgTypes = {long int, long int}]':
| external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h:801:9: internal compiler error: in emit_move_insn, at expr.c:3698
|          values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
|          ^~~~~~
| Please submit a full bug report,
| with preprocessed source if appropriate.
| See <https://gcc.gnu.org/bugs/> for instructions.

and bitbake reports tmp/work/core2-64-poky-linux/tensorflow/1.13.0-r0/temp/run.do_compile.2045:1 exit 1 same result as I got from the build hash here.

Related issue https://github.com/tensorflow/tensorflow/issues/25323

Next up: I'll checkout warrior and retry

hongxu-jia commented 5 years ago

Anything reply on warrior?

mhaldrich commented 5 years ago

Hi @hongxu-jia -- testing soon, will report back. Thank you for keeping this open.

mhaldrich commented 5 years ago

Building now with a checkout of warrior. When building both aarch64 targets and genericx86-64 I get an error with the bundled grpc, which seems common when built with glibc 2.30. Edit: this is confusing though, since glibc is v2.29 in Warrior

Build trace looks like:

<snip>

 external/grpc/src/core/lib/gpr/log_linux.cc: In function 'void gpr_default_log(gpr_log_func_args*)':
| external/grpc/src/core/lib/gpr/log_linux.cc:77:23: error: 'gettid' was not declared in this scope
|    if (tid == 0) tid = gettid();
|                        ^~~~~~
| external/grpc/src/core/lib/gpr/log_linux.cc:77:23: note: suggested alternative: 'getgid'
|    if (tid == 0) tid = gettid();
|                        ^~~~~~
|                        getgid

</snip>

Looks like you have something here that is relevant.

Previously in Sumo, I had build aarch64 shared objects but could not build genericx86-64 shared objects.

Related Issues

https://github.com/clearlinux/distribution/issues/1151 https://github.com/grpc/grpc/pull/18950

hongxu-jia commented 5 years ago

I think I've fixed the issue already, does your local repo contain the following commit? https://github.com/Wind-River/meta-tensorflow/commit/806fb812a3202399f6806aa13b5e032eb825ca58

mhaldrich commented 5 years ago

@hongxu-jia -- with the warrior toolchain setup, I have completed builds for aarch64 and x86-64 targets. Strangely enough, the glibc patchwork didn't do the trick (my repo contained commit 806fb81). Removing this patch did work. Commit history is in my fork. I will close issue.

Thanks again this bitbake recipe -- very much appreciated!