TF 1.14 GPU, CC 3.0, CUDA 10, cuDNN 7.4, Python 3.7, Ubuntu 16.04

zaimzazali commented 3 years ago

Hey mate,

I am trying to reach you on the other socmed platforms (since I don't have a twitter account) but got no luck. Thus, I am reaching you thru here instead, if you don't mind. :)

I have tried more than 7 times for 4 whole days just to compile TF with GPU support. I am using GTX 870m (yeah it's a 7 years old laptop haha) and I have tried with Ubuntu 20 and Ubuntu 18 but I could not get these OS to properly install my GPU driver. Thus, I am using Ubuntu 16 instead and its quite stable especially for my very old laptop.

I tried to follow the steps on the TF Docs (build from source, https://www.tensorflow.org/install/source#gpu_support_2) but keep on failing.

I wonder if you can help me, please? I could not compile the wheel :(

davidenunes commented 3 years ago

Hey,

which other socmed platforms did you try out of curiosity?

To be honest I never tried to build for Ubuntu 16 or that type of card. Do you have any details on the errors you get when you try to build that wheel? Did you also check the compatibility between your CUDA version and your GPU driver version from here?

Just a thought: I wonder if it's worth using TF with that particular card. There's an overhead in using the GPU which (in my experience) for older models means you end up running things faster on CPU, as long as you compile TF for your particular CPU arch.

zaimzazali commented 3 years ago

Hey,

Sorry for the late reply. Erm, facebook? haha

Yerp, im pretty sure I follow the tested setup

from ./configuration

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.24.1 installed. Please specify the location of python. [Default is /home/zaimzazali/ProgramFiles/anaconda3/envs/tf_gpu/bin/python]:

Found possible Python library paths: /home/zaimzazali/ProgramFiles/anaconda3/envs/tf_gpu/lib/python3.7/site-packages Please input the desired Python library path to use. Default is [/home/zaimzazali/ProgramFiles/anaconda3/envs/tf_gpu/lib/python3.7/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow.

Found CUDA 10.0 in: /usr/local/cuda/lib64 /usr/local/cuda/include Found cuDNN 7 in: /usr/local/cuda/lib64 /usr/include

Please specify a list of comma-separated CUDA compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.0]: 3.0

WARNING: XLA does not support CUDA compute capabilities lower than 3.5. Disable XLA when running on older GPUs. Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apache Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished

after i try to 'bazel' it... bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package

INFO: Analysed target //tensorflow/tools/pip_package:build_pip_package (394 packages loaded, 20370 targets configured). INFO: Found 1 target... ERROR: /home/zaimzazali/.cache/bazel/_bazel_zaimzazali/0c479d6afa1c2483a7b66070953a7e30/external/protobuf_archive/BUILD:91:1: C++ compilation of rule '@protobuf_archive//:protobuf_lite' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /home/zaimzazali/.cache/bazel/_bazel_zaimzazali/0c479d6afa1c2483a7b66070953a7e30/execroot/org_tensorflow && \ exec env - \ LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64: \ PATH=/home/zaimzazali/ProgramFiles/anaconda3/envs/tf_gpu/bin:/home/zaimzazali/ProgramFiles/anaconda3/bin:/home/zaimzazali/ProgramFiles/anaconda3/bin:/usr/local/cuda-10.0/bin:/home/zaimzazali/ProgramFiles/anaconda3/bin:/home/zaimzazali/ProgramFiles/anaconda3/condabin:/home/zaimzazali/bin:/home/zaimzazali/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \ PWD=/proc/self/cwd \ external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/common.d '-frandom-seed=bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/common.o' -iquote external/protobuf_archive -iquote bazel-out/host/genfiles/external/protobuf_archive -iquote bazel-out/host/bin/external/protobuf_archive -isystem external/protobuf_archive/src -isystem bazel-out/host/genfiles/external/protobuf_archive/src -isystem bazel-out/host/bin/external/protobuf_archive/src '-std=c++11' -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 '-march=native' -g0 -DHAVE_PTHREAD -DHAVE_ZLIB -Wall -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function -Wno-write-strings -c external/protobuf_archive/src/google/protobuf/stubs/common.cc -o bazel-out/host/bin/external/protobuf_archive/_objs/protobuf_lite/common.o) Execution platform: @bazel_tools//platforms:host_platform gcc: error trying to exec 'cc1plus': execvp: No such file or directory Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 15.833s, Critical Path: 2.40s INFO: 2 processes: 2 local. FAILED: Build did NOT complete successfully

Hmmmmmmmmmmmmmmmmm T-T

davidenunes commented 3 years ago

The error says that cc1plus cant be found, it doesn't look like a TF problem. Did you install build-essential ?

does

locate cc1plus

return anything?

Try re-installing g++ or making a symlink to cc1plus if you have it available The build-essential should include g++

PS: I don't have an active FB account, the contacts I use are in my profile or my Webpage.

zaimzazali commented 3 years ago

Hey man,

Sorry for the superbly late reply. Quite busy lately.

So I tried everything again, and got new error.. haha

ERROR: /home/zaimzazali/Downloads/tensorflow/tensorflow/python/BUILD:453:1: undeclared inclusion(s) in rule '//tensorflow/python:py_seq_tensor': this rule is missing dependency declarations for the following files included by 'tensorflow/python/lib/core/py_seq_tensor.cc': '/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/limits.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/syslimits.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stddef.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stdarg.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stdint.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/mmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/emmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/xmmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/mm_malloc.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/pmmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/tmmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/smmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/popcntintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/nmmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/immintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/wmmintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avxintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/avx2intrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/lzcntintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/bmiintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/bmi2intrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/fmaintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/f16cintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/x86intrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/ia32intrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/fxsrintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/xsaveintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/xsaveoptintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/adxintrin.h' '/usr/lib/gcc/x86_64-linux-gnu/4.8/include/stdbool.h' tensorflow/python/lib/core/py_seq_tensor.cc: In function ‘tensorflow::Status tensorflow::PySeqToTensor(PyObject, tensorflow::DataType, tensorflow::Tensor)’: tensorflow/python/lib/core/py_seq_tensor.cc:626:63: warning: ‘infer_dtype’ may be used uninitialized in this function [-Wmaybe-uninitialized] DataTypeString(infer_dtype)); ^ Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 38.337s, Critical Path: 9.94s INFO: 38 processes: 38 local. FAILED: Build did NOT complete successfully

davidenunes commented 3 years ago

is it the same issue as in here https://github.com/tensorflow/tensorflow/issues/39340 ?

zaimzazali commented 3 years ago

Just ran with bazel clean --expunge and git clean -fxd before 'bazel' it..

But got another new error.. Btw, Im running on r.1.14 branch..

ERROR: /home/zaimzazali/Downloads/tensorflow/tensorflow/python/BUILD:336:1: C++ compilation of rule '//tensorflow/python:bfloat16_lib' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /home/zaimzazali/.cache/bazel/_bazel_zaimzazali/0c479d6afa1c2483a7b66070953a7e30/execroot/org_tensorflow && \ exec env - \ LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64: \ PATH=/home/zaimzazali/ProgramFiles/anaconda3/bin:/home/zaimzazali/ProgramFiles/anaconda3/condabin:/usr/local/cuda-10.0/bin:/home/zaimzazali/bin:/home/zaimzazali/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \ PWD=/proc/self/cwd \ .............. (very long consequence errors) Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 4633.659s, Critical Path: 337.09s INFO: 6566 processes: 6566 local. FAILED: Build did NOT complete successfully

Now I'm trying with this https://github.com/tensorflow/tensorflow/issues/41584 We'll see how it goes..

zaimzazali commented 3 years ago

Nah, I just gave up.. My laptop is too old I guess... 70% of the time could not detect the GPU..

Gotta build a new PC then.. I just got the Ryzen 5900x... Currently trying to get an RTX 3080 which is quite impossible to get one for now..

davidenunes commented 3 years ago

As I said, the GPU in your laptop would probably not be enough to overcome the overhead of using it. You're better of using a CPU version compiled for your microarchitecture in that case. Depending on what you want to do, it's not that different honestly.

Also, if you want to test things with a free GPU try google collab notebooks 👍 https://colab.research.google.com

davidenunes / tensorflow-wheels

TF 1.14 GPU, CC 3.0, CUDA 10, cuDNN 7.4, Python 3.7, Ubuntu 16.04 #14