CSBDeep / CSBDeep_fiji

BSD 2-Clause "Simplified" License
11 stars 4 forks source link

Versions of CUDA, cuDNN, and TensorFlow all need to be compatible #22

Open uschmidt83 opened 5 years ago

uschmidt83 commented 5 years ago

Hi,

the plugin currently uses an old TensorFlow (TF) version (1.6.0), which is incompatible with the installation requirements of more recent TF versions. Specifically, TF currently requires cuDNN >= 7.2 to be installed, which is too new for TF 1.6.0 (see error below).

Furthermore, installation of cuDNN is currently not mentioned in the documentation.

Best, Uwe

$ ./ImageJ-linux64 --java-home /sw/apps/jdk/current
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
[INFO] imagej-tensorflow version: 1.0.1
[INFO] tensorflow version: 1.6.0
[INFO] The current library path is: LD_LIBRARY_PATH=/sw/apps/cuda/9.0.176/lib64:/sw/apps/cuda/9.0.176/lib:/home/uschmidt/sw/local/lib:/home/uschmidt/tmp/new_fiji/Fiji.app/lib/linux64:/home/uschmidt/tmp/new_fiji/Fiji.app/mm/linux64
[INFO] loading model net_tubulin from source http://csbdeep.bioimagecomputing.com/model-tubulin.zip
[INFO] TensorFlow model cache: /home/uschmidt/tmp/new_fiji/Fiji.app/models
2018-11-19 16:19:18.450296: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-11-19 16:19:18.451773: I tensorflow/cc/saved_model/loader.cc:240] Loading SavedModel with tags: { serve }; from: /home/uschmidt/tmp/new_fiji/Fiji.app/models/net_tubulin
2018-11-19 16:19:18.631059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-19 16:19:18.631955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:05:00.0
totalMemory: 11.93GiB freeMemory: 11.71GiB
2018-11-19 16:19:18.632026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-11-19 16:22:32.125101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11339 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:05:00.0, compute capability: 5.2)
2018-11-19 16:22:32.283239: I tensorflow/cc/saved_model/loader.cc:159] Restoring SavedModel bundle.
2018-11-19 16:22:32.329594: I tensorflow/cc/saved_model/loader.cc:194] Running LegacyInitOp on SavedModel bundle.
2018-11-19 16:22:32.330448: I tensorflow/cc/saved_model/loader.cc:289] SavedModel load for tags { serve }; Status: success. Took 193878674 microseconds.
[INFO] Shape of input tensor: [-1, -1, -1, 1]
[INFO] Shape of output tensor: [-1, -1, -1, 2]
datasetAxes:[X, Y]
nodeAxes:[Time, Y, X, Channel]
mapping:[2, 1, 0, 3]
--------------
[INFO] Normalize ..
[INFO] Dataset type: 32-bit signed float, converting to FloatType.
[INFO] Dataset dimensions: [720, 576]
[INFO] INPUT NODE:
datasetAxes:[X, Y]
nodeAxes:[Time, Y, X, Channel]
mapping:[2, 1, 0, 3]
--------------
[INFO] OUTPUT NODE:
datasetAxes:[X, Y]
nodeAxes:[Time, Y, X, Channel]
mapping:[2, 1, 0, 3]
--------------
[INFO] Dividing image into 1 tile(s)..
[INFO] Size of single image tile: [720, 576]
[INFO] Final image tiling: [1, 1]
[INFO] Network input size: [720, 576]
[INFO] Processing tile 1..
2018-11-19 16:22:33.661466: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7104 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-11-19 16:22:33.662665: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
[1]    28406 abort (core dumped)  ./ImageJ-linux64 --java-home /sw/apps/jdk/current
sommerc commented 5 years ago

Hi,

currently the csbdeep update site for fiji provides tensorflow 1.12 bindings (tensorflow_jni), which are linked against the Cuda Toolkit 9.0 and require cuDNN >=7.2.1

In case you want to use Cuda Toolkit 10.0, you can get the lastest tensorflow 1.13 (tensorflow_jni.dll, libtensorflow.jar) bindings from: https://www.tensorflow.org/install/lang_java

Together with cuDNN==7.5.1 everything works nicely!

Would be cool if some of the version requirements could be mentioned in the doc!

Cheers, Chris