FloopCZ / tensorflow_cc

Build and install TensorFlow C++ API library.
MIT License
761 stars 183 forks source link

tensorflow_cc slow startup, tensorflow over python works as expected #277

Closed hakan6710 closed 3 years ago

hakan6710 commented 3 years ago

HI, i tried my own Code and your simple example. The startup of the programm is really slow and takes a couple of minutes. I am using tensorflow_cc 2.4.0 but also tried 2.5.0. A python code with tensorflow 2.4.0 works just fine.

Should i try a specific version of tensorflow_cc or do you have any other recommendations? Thanks in advance. Nvidia driver: 470.57.02 Cuda: 11.0 Cudnn: 8.0.4.30 tensorrt: 7.1.3.1

2021-08-06 13:38:02.604450: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-08-06 13:38:02.638129: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-06 13:38:02.639236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2021-08-06 13:38:02.654740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:38:02.655259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: Quadro P3000 computeCapability: 6.1 coreClock: 1.493GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 156.64GiB/s 2021-08-06 13:38:02.655281: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2021-08-06 13:38:02.657440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2021-08-06 13:38:02.657489: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2021-08-06 13:38:02.658482: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2021-08-06 13:38:02.658725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2021-08-06 13:38:02.661359: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2021-08-06 13:38:02.661955: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2021-08-06 13:38:02.662080: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2021-08-06 13:38:02.662143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:38:02.662614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:38:02.662960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-08-06 13:40:47.277820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-06 13:40:47.277841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-08-06 13:40:47.277846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-08-06 13:40:47.277935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:40:47.278278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:40:47.278787: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-08-06 13:40:47.279217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: Quadro P3000, pci bus id: 0000:01:00.0, compute capability: 6.1) 2021-08-06 13:40:47.279442: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set Session successfully created.

hakan6710 commented 3 years ago

I tested your docker containers with the current version and ubuntu-cuda-2.3.1
Both have the same problem.

hakan6710 commented 3 years ago

with export TF_CUDA_COMPUTE_CAPABILITIES="6.1" it worked.

I think in the tensorflow repo they read the cuda_compute_capabilites, could you look into that maybe?

image https://github.com/tensorflow/tensorflow/blob/master/configure.py

hakan6710 commented 3 years ago

Can you think of a way to fix this for your docker container? I can only use my own build but not your docker containers.

FloopCZ commented 3 years ago

Hi, the images are built on a server so reading the capabilities from the driver would probably not help. I can, however, extend the list of compute capabilities. It is time, anyway. https://github.com/FloopCZ/tensorflow_cc/pull/279 Does this help?

hakan6710 commented 3 years ago

Yes, that should help. I will test it, as soon as you have a docker container with this version rdy.

FloopCZ commented 3 years ago

The ubuntu images are updated, feel free to try.

hakan6710 commented 3 years ago

Works. Thanks