ARM-software / Tool-Solutions

Tutorials & examples for Arm software development tools.
Apache License 2.0
254 stars 138 forks source link

Tensorflow AArch64 error with build-onednn.sh #25

Open jakemdaly opened 4 years ago

jakemdaly commented 4 years ago

I am trying to build Tensorflow docker image on an aarch64 device, and it's not able to find the ONEDNN_VERSION (it appears it's blank)

 Step 36/101 : RUN $PACKAGE_DIR/build-onednn.sh
 ---> Running in 6b6eb5dd3985
oneDNN VERSION
Cloning into 'mkl-dnn'...
fatal: 'v' is not a commit and a branch 'v' cannot be created from it
The command '/bin/sh -c $PACKAGE_DIR/build-onednn.sh' returned a non-zero code: 128
ERROR: Job failed: command terminated with exit code 128

command that is failing in scripts/build-onednn.sh:

cd $PACKAGE_DIR
readonly package=onednn
readonly version=$ONEDNN_VERSION
readonly tf_id=$TF_VERSION_ID
readonly src_host=https://github.com/intel
readonly src_repo=mkl-dnn

# Clone oneDNN
echo "oneDNN VERSION" $version
git clone ${src_host}/${src_repo}.git
cd ${src_repo}
git checkout v$version -b v$version

any advice?

nSircombe commented 4 years ago

Hi @jakemdaly,

Thanks for your issue - it's very useful to get direct feedback from users.

ONEDNN_VERSION should be set as an environment variable in the Docker image. This is done by passing it into docker build as a --build-arg.

The build.sh script does this by setting a version here and passing it to the Docker build in extra_arg.

Inside the Dockerfile, this build argument is picked up here and set as an environment variable in the image here - that last step is where the ONEDNN_VERSION that the build-onednn.sh script, running inside the image, comes from.

If you're following the steps in the readme then this should all be taken care of by build.sh.

Note: only particular combinations of oneDNN and TensorFlow versions are supported, as there are some patches applied to each at present, so if you alter the version numbers set in build.sh it may not work as expected. We plan to update in the near future to support TF 2.3. Also, oneDNN is only used by the TensorFlow build if the build.sh script is run with the --onednn or --dnnl flags (although it is always built along with the other dependencies).