NifTK / NiftyNet

[unmaintained] An open-source convolutional neural networks platform for research in medical image analysis and image-guided therapy
http://niftynet.io
Apache License 2.0
1.36k stars 404 forks source link

problems to run niftynet #446

Open yuniorcf opened 4 years ago

yuniorcf commented 4 years ago

Hello, I am trying to test NiftyNet for the first time but I am unable to do it. I have configured the instalation according to this site (source code repository): https://niftynet.readthedocs.io/en/dev/installation.html I have sicessfuly downloaded the model, however, once I execute te command "python net_segment.py inference -c ~/niftynet/extensions/dense_vnet_abdominal_ct/config.ini" I get the follwing errors: .... -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Initialising Dataset from 1 subjects... 2019-10-01 13:53:56.311601: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.312103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725 pciBusID: 0000:01:00.0 2019-10-01 13:53:56.312156: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-10-01 13:53:56.312167: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-10-01 13:53:56.312177: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2019-10-01 13:53:56.312186: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2019-10-01 13:53:56.312195: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2019-10-01 13:53:56.312205: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2019-10-01 13:53:56.312215: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-01 13:53:56.312256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.312735: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.313199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-10-01 13:53:56.313229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-01 13:53:56.313233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-10-01 13:53:56.313240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-10-01 13:53:56.313345: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.313816: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-01 13:53:56.314271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6821 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Restoring parameters from /home/yunior/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000 2019-10-01 13:53:56.630423: W tensorflow/core/common_runtime/colocation_graph.cc:1016] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [ /job:localhost/replica:0/task:0/device:CPU:0]. See below for details of this colocation group: Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_nameindex=-1 requested_devicename='/device:GPU:0' assigned_devicename='' resource_devicename='/device:GPU:0' supported_devicetypes=[CPU] possibledevices=[] IteratorGetNext: CPU GPU XLA_CPU XLA_GPU OneShotIterator: CPU IteratorToStringHandle: CPU GPU XLA_CPU XLA_GPU

Colocation members, user-requested devices, and framework assigned devices, if any: worker_0/validation/OneShotIterator (OneShotIterator) /device:GPU:0 worker_0/validation/IteratorToStringHandle (IteratorToStringHandle) /device:GPU:0 worker_0/validation/IteratorGetNext (IteratorGetNext) /device:GPU:0

2019-10-01 13:53:57.360882: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-10-01 13:53:57.991115: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:57.998596: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:58.001047: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-01 13:53:58.001075: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support INFO:niftynet: cleaning up... INFO:niftynet: stopping sampling threads ...... my configuration is as follows CPU conf. intel I7 (8 cores) and 64GB RAM GPU conf. GeForce RTX 2070, 8GB, 2304 cores

In addition I have installed the gpu-version of tensorflow to use de GPU por calculations I can imaging that errors are related to memory issues in the GPU. I wonder whether is there a way to use the memory on the CPU as well.

Could you please give me a feedback. Note I am not an expert using python

thanks in advance

danieltudosiu commented 4 years ago

As per Tensorflow issue #24496 it seems to be a tensorflow problem.

Could you please try and run this tensorflow example and let us know if the same error appears there.

yuniorcf commented 4 years ago

Thank you very much for the reply. I have tried this and I got no errors at all. The script did the predictions and this are the final message: ... Test accuracy: 0.8813 (28, 28) (1, 28, 28) [[3.5098125e-04 1.3001217e-15 9.9916017e-01 4.8920496e-11 4.2484555e-04 5.2356322e-12 6.4001571e-05 5.9205704e-17 5.7315066e-11 2.8146843e-15]]

I have an additional comment that might help to figure out the problem with NiftyNet. I faced problems with tf at the beguining. The thing is that I have 1.14.0 version of tf and apparently NiftyNet have troubles with this version. As a simple solution the program suggested to use tf.compat.v1.Session in several subscripts of the software. Therefore I used:

import tensorflow.compat.v1 as tf tf.disable_v2_behavior()

instead of import tensorflow as tf

Then errors with tensorflow session were fixed Could it be the source of the current problem?

Thank you in advance

yuniorcf commented 4 years ago

Hi, i did some progress, i think. I have upgraded nvidia drivers and cuda toolkit. At leas I do not see the previous errors anymore. Now I have nvidia-418, cuda-10.1 and tf 1.14. However I have a new error (see below) ...... Traceback (most recent call last): File "net_segment.py", line 5, in from niftynet import main File "/home/yunior/NiftyNet/niftynet/init.py", line 62, in import niftynet.utilities.user_parameters_parser as user_parameters_parser File "/home/yunior/NiftyNet/niftynet/utilities/user_parameters_parser.py", line 22, in from niftynet.utilities.user_parameters_default import \ File "/home/yunior/NiftyNet/niftynet/utilities/user_parameters_default.py", line 10, in from niftynet.engine.image_window_dataset import SMALLER_FINAL_BATCH_MODE File "/home/yunior/NiftyNet/niftynet/engine/image_window_dataset.py", line 18, in from niftynet.layer.base_layer import Layer File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 11, in from niftynet.engine.application_variables import RESTORABLE File "/home/yunior/NiftyNet/niftynet/engine/application_variables.py", line 10, in from tensorflow.contrib.framework import list_variables File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/init.py", line 37, in from tensorflow.contrib import cudnn_rnn File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/init.py", line 38, in from tensorflow.contrib.cudnn_rnn.python.layers import File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/init.py", line 23, in from tensorflow.contrib.cudnn_rnn.python.layers.cudnn_rnn import File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 20, in from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 22, in from tensorflow.contrib.rnn.python.ops import lstm_ops File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/rnn/init.py", line 91, in from tensorflow.contrib.rnn.python.ops.lstm_ops import * File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py", line 298, in @ops.RegisterGradient("BlockLSTM") File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 2489, in call _gradient_registry.register(f, self._op_type) File "/home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/registry.py", line 61, in register (self._name, name, function_name, filename, line_number)) KeyError: "Registering two gradient with name 'BlockLSTM'! (Previous registration was in register /home/yunior/.conda/envs/my_env/lib/python3.7/site-packages/tensorflow_core/python/framework/registry.py:66)"

Please, could anybody suggest a tentative solution? Thanks

yuniorcf commented 4 years ago

Hi guys, I really need NiftyNet running in my PC. However after more than a week I am not able to do it. Could somebody guiveme a feedback please? I have been trying to run the example posted here with no success. I have tried several configuration of nvidia drivers, cuda versions, cudnn and tensorflow but no progress at all. I currently have Ubuntu 18.04, Nvidia 4.18, cuda 10.0, cudnn 7.3.0. I see the following messages in the terminal when executed the program.

NiftyNet version 0.5.0+185.gb5f3ba1e.dirty [CUSTOM] -- num_classes: 9 -- output_prob: False -- label_normalisation: False -- softmax: True -- min_sampling_ratio: 0 -- compulsory_labels: (0, 1) -- rand_samples: 0 -- min_numb_labels: 1 -- proba_connect: True -- evaluation_units: foreground -- do_mixup: False -- mixup_alpha: 0.2 -- mix_match: False -- weight: () -- inferred: () -- sampler: () -- label: ('label',) -- image: ('ct',) -- name: net_segment [CONFIG_FILE] -- path: /home/yunior/niftynet/extensions/dense_vnet_abdominal_ct/config.ini [CT] -- csv_file: -- path_to_search: ./data/dense_vnet_abdominal_ct/ -- filename_contains: ('CT',) -- filename_not_contains: () -- filename_removefromid: -- interp_order: 1 -- loader: None -- pixdim: () -- axcodes: ('A', 'R', 'S') -- spatial_window_size: (144, 144, 144) [LABEL] -- csv_file: -- path_to_search: ./data/dense_vnet_abdominal_ct/ -- filename_contains: ('Label',) -- filename_not_contains: () -- filename_removefromid: -- interp_order: 0 -- loader: None -- pixdim: () -- axcodes: ('A', 'R', 'S') -- spatial_window_size: (144, 144, 144) [SYSTEM] -- cuda_devices: 0 -- num_threads: 1 -- num_gpus: 1 -- model_dir: /home/yunior/niftynet/models/dense_vnet_abdominal_ct -- dataset_split_file: ./dataset_split.csv -- event_handler: ('model_saver', 'model_restorer', 'sampler_threading', 'apply_gradients', 'output_interpreter', 'console_logger', 'tensorboard_logger', 'performance_logger') -- iteration_generator: iteration_generator -- queue_length: 36 -- action: inference [NETWORK] -- name: dense_vnet -- activation_function: relu -- batch_size: 1 -- smaller_final_batch_mode: pad -- decay: 0.0 -- reg_type: L2 -- volume_padding_size: (0, 0, 0) -- volume_padding_mode: minimum -- volume_padding_to_size: (0,) -- window_sampling: resize -- force_output_identity_resizing: False -- queue_length: 5 -- multimod_foreground_type: and -- histogram_ref_file: ./histogram_ref_file.txt -- norm_type: percentile -- cutoff: (0.01, 0.99) -- foreground_type: otsu_plus -- normalisation: False -- rgb_normalisation: False -- whitening: False -- normalise_foreground_only: False -- weight_initializer: he_normal -- bias_initializer: zeros -- keep_prob: 1.0 -- weight_initializer_args: {} -- bias_initializer_args: {} [TRAINING] -- optimiser: adam -- sample_per_volume: 1 -- rotation_angle: () -- rotation_angle_x: () -- rotation_angle_y: () -- rotation_angle_z: () -- scaling_percentage: () -- isotropic_scaling: False -- antialiasing: True -- bias_field_range: () -- bf_order: 3 -- random_flipping_axes: -1 -- do_elastic_deformation: False -- num_ctrl_points: 4 -- deformation_sigma: 15 -- proportion_to_deform: 0.5 -- lr: 0.001 -- loss_type: dense_vnet_abdominal_ct.dice_hinge.dice -- starting_iter: 0 -- save_every_n: 1000 -- tensorboard_every_n: 20 -- max_iter: 3001 -- max_checkpoints: 100 -- validation_every_n: -1 -- validation_max_iter: 1 -- exclude_fraction_for_validation: 0.0 -- exclude_fraction_for_inference: 0.0 -- vars_to_restore: -- vars_to_freeze: -- patience: 100 -- early_stopping_mode: mean [INFERENCE] -- spatial_window_size: (144, 144, 144) -- inference_iter: 3000 -- dataset_to_infer: -- save_seg_dir: ./segmentation_output/ -- output_postfix: _niftynet_out -- output_interp_order: 0 -- border: (0, 0, 0) -- fill_constant: 0.0 INFO:niftynet: set CUDA_VISIBLE_DEVICES to 0 INFO:niftynet: starting segmentation application INFO:niftynet: csv_file = not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv" instead. INFO:niftynet: [ct] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/ct.csv INFO:niftynet: csv_file = not found, writing to "/home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv" instead. INFO:niftynet: [label] search file folders, writing csv file /home/yunior/niftynet/models/dense_vnet_abdominal_ct/label.csv INFO:niftynet:

Number of subjects 1, input section names: ['subject_id', 'ct', 'label'] -- using all subjects (without data partitioning).

INFO:niftynet: Image reader: loading 1 subjects from sections ('ct',) as input [image] 2019-10-09 13:38:12.946391: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-09 13:38:12.949613: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2019-10-09 13:38:12.949963: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fa8e60d9d0 executing computations on platform Host. Devices: 2019-10-09 13:38:12.949975: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-10-09 13:38:13.047755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-09 13:38:13.048308: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55fa8d4b5a40 executing computations on platform CUDA. Devices: 2019-10-09 13:38:13.048320: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5 2019-10-09 13:38:13.048406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.725 pciBusID: 0000:01:00.0 totalMemory: 7.76GiB freeMemory: 7.21GiB 2019-10-09 13:38:13.048414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-09 13:38:13.049017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-09 13:38:13.049024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-09 13:38:13.049027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-09 13:38:13.049085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: reading size of preprocessed images INFO:niftynet: initialised resize sampler {'image': (1, 144, 144, 144, 1, 1), 'image_location': (1, 7)} INFO:niftynet: using DenseVNet 2019-10-09 13:38:13.056568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-09 13:38:13.056584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-09 13:38:13.056588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-09 13:38:13.056591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-09 13:38:13.056641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Initialising Dataset from 1 subjects... 2019-10-09 13:38:14.395612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-10-09 13:38:14.395650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-09 13:38:14.395654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-10-09 13:38:14.395657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-10-09 13:38:14.395743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7012 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) INFO:niftynet: Restoring parameters from /home/yunior/niftynet/models/dense_vnet_abdominal_ct/models/model.ckpt-3000 2019-10-09 13:38:15.997092: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED 2019-10-09 13:38:15.998516: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-09 13:38:16.000078: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-10-09 13:38:16.000099: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support INFO:niftynet: cleaning up... INFO:niftynet: stopping sampling threads Traceback (most recent call last): File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node worker_0/DenseVNet/convbn/conv/conv}}]] [[{{node worker_0/post_processing/ExpandDims}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "net_segment.py", line 8, in sys.exit(main()) File "/home/yunior/NiftyNet/niftynet/init.py", line 148, in main app_driver.run(app_driver.app) File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 206, in run loop_status=loop_status) File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 332, in loop ApplicationDriver.loop_step(application, iter_msg) File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 364, in loop_step feed_dict=iteration_message.data_feed_dict) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node worker_0/DenseVNet/convbn/conv/conv (defined at /home/yunior/NiftyNet/niftynet/layer/convolution.py:100) ]] [[node worker_0/post_processing/ExpandDims (defined at /home/yunior/NiftyNet/niftynet/layer/post_processing.py:36) ]]

Caused by op 'worker_0/DenseVNet/convbn/conv/conv', defined at: File "net_segment.py", line 8, in sys.exit(main()) File "/home/yunior/NiftyNet/niftynet/init.py", line 148, in main app_driver.run(app_driver.app) File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 190, in run is_training_action=self.is_training_action) File "/home/yunior/NiftyNet/niftynet/engine/application_driver.py", line 271, in create_graph outputs_collector, gradients_collector) File "/home/yunior/NiftyNet/niftynet/application/segmentation_application.py", line 458, in connect_data_and_network net_out = self.net(image, net_args) File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call return self._op(*args, *kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call return self._call_func(args, kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func result = self._func(args, kwargs) File "/home/yunior/NiftyNet/niftynet/network/dense_vnet.py", line 233, in layer_op input_tensor, is_training=is_training) File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call return self._op(*args, kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call return self._call_func(args, kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func result = self._func(*args, *kwargs) File "/home/yunior/NiftyNet/niftynet/layer/convolution.py", line 254, in layer_op output_tensor = activation(conv_layer(input_tensor)) File "/home/yunior/NiftyNet/niftynet/layer/base_layer.py", line 35, in call return self._op(args, kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 360, in call return self._call_func(args, kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func result = self._func(*args, *kwargs) File "/home/yunior/NiftyNet/niftynet/layer/convolution.py", line 100, in layer_op name='conv') File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution return op(input, filter) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in call return self.conv_op(inp, filter) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in call return self.call(inp, filter) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in call name=self.name) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1440, in conv3d dilations=dilations, name=name) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/yunior/NN_env/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node worker_0/DenseVNet/convbn/conv/conv (defined at /home/yunior/NiftyNet/niftynet/layer/convolution.py:100) ]] [[node worker_0/post_processing/ExpandDims (defined at /home/yunior/NiftyNet/niftynet/layer/post_processing.py:36) ]]

I would like to add that when execute the program with no gpu compatibility, te software works but slowly.

Thank you in advance

danieltudosiu commented 4 years ago

I have never encountered your problem. Also, it seems to be Tensorflow & CUDA related more then NiftyNet related, which is also referenced by the fact that it works on CPU but not on GPU.

Could you please modify the following line in util_common.py:

       def tf_config():
           """
           tensorflow system configurations
           """
             config = tf.ConfigProto()
             config.log_device_placement = False
             config.allow_soft_placement = True
             return config

with

       def tf_config():
           """
           tensorflow system configurations
           """
             config = tf.ConfigProto()
             config.log_device_placement = False
             config.allow_soft_placement = True
             config.gpu_options.allow_growth = True
             return config
talmazov commented 4 years ago

i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16 using geforce RTX 2060 6GB vram with nvidia driver 440.33.01 tensorflow tries to allocate 5 GB spatial_window_size = (64, 64, 512) with dense_vnet network

i've tried config.gpu_options.allow_growth = True but it doesn't seem to work. I get the same "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"

any solution so far? I am not sure if legacy drivers will work better, maybe the v390 nvidia driver is compatible? I wonder if this memcpy and CUDNN internal error is related to the newer drivers/cards I bought a GTX 1080 Ti w/ 11GB ram, will see if this one supports niftynet

yuniorcf commented 4 years ago

Hello, I am not an expert in Python programming and therefore I don't know the pretty way to do it. As in your case I also tried to use "config.gpu_options.allow_growth = True" but for whatever reason it did'n work for me neither. However, because it is not that problematic for me, I type the following command before running niftynet: export TF_FORCE_GPU_ALLOW_GROWTH=true This solved my problem Hope this help youPlease in case some one want to share the easy and permanet way to do it please share it. Best

En domingo, 8 de diciembre de 2019 23:49:55 CET, talmazov <notifications@github.com> escribió:  

i have niftynet 0.6, CUDA 10.0, tensorflow-gpu 1.13.2 and numpy 1.16 using geforce RTX 2060 6GB vram with nvidia driver 440.33.01 tensorflow tries to allocate 5 GB spatial_window_size = (64, 64, 512) with dense_vnet network

i've tried config.gpu_options.allow_growth = True but it doesn't seem to work. I get the same "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR"

any solution so far? I am not sure if legacy drivers will work better, maybe the v390 nvidia driver is compatible? I wonder if this memcpy and CUDNN internal error is related to the newer drivers/cards I bought a GTX 1080 Ti w/ 11GB ram, will see if this one supports niftynet

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.