csyben / PYRO-NN

Python Reconstruction Operators in Neural Networks. High level python API for PYRO-NN-Layers
Apache License 2.0
108 stars 34 forks source link

PYRO-NN and tf.layers.conv2d #4

Closed pgmoka closed 3 years ago

pgmoka commented 5 years ago

Hello! I am with a team working on implementing PYRO-NN with tf.conv2d, and we were running into this error:

>           [[{{node FanProjection2D}} = FanProjection2D[central_ray_vectors=Tensor<type: float shape: [90,2] values: [1 0][0.997564077...]...>, detector_origin=Tensor<type: float shape: [1] values: -629.732544>, detector_spacing=Tensor<type: float shape: [1] values: 1.73003447>, projection_shape=[90,729], source_2_detector_distance=2000, source_2_isocenter_distance=1000, volume_origin=Tensor<type: float shape: [2] values: -255.5 -255.5>, volume_shape=[512,512], volume_spacing=Tensor<type: float shape: [2] values: 1 1>](ones_1)]]
> 2019-08-05 17:55:45.072870: E tensorflow/core/common_runtime/executor.cc:623] Executor failed to create kernel. Not found: No registered 'FanBackprojection2D' OpKernel for CPU devices compatible with node {{node FanBackprojection2D}} = FanBackprojection2D[central_ray_vectors=Tensor<type: float shape: [90,2] values: [1 0][0.997564077...]...>, detector_origin=Tensor<type: float shape: [1] values: -629.732544>, detector_spacing=Tensor<type: float shape: [1] values: 1.73003447>, sinogram_shape=[90,729], source_2_detector_distance=2000, source_2_isocenter_distance=1000, volume_origin=Tensor<type: float shape: [2] values: -255.5 -255.5>, volume_shape=[512,512], volume_spacing=Tensor<type: float shape: [2] values: 1 1>](ones)
>         .  Registered:  device='GPU'
> 
>          [[{{node FanBackprojection2D}} = FanBackprojection2D[central_ray_vectors=Tensor<type: float shape: [90,2] values: [1 0][0.997564077...]...>, detector_origin=Tensor<type: float shape: [1] values: -629.732544>, detector_spacing=Tensor<type: float shape: [1] values: 1.73003447>, sinogram_shape=[90,729], source_2_detector_distance=2000, source_2_isocenter_distance=1000, volume_origin=Tensor<type: float shape: [2] values: -255.5 -255.5>, volume_shape=[512,512], volume_spacing=Tensor<type: float shape: [2] values: 1 1>](ones)]]
> 2019-08-05 17:55:45.082548: E tensorflow/core/common_runtime/executor.cc:623] Executor failed to create kernel. Not found: No registered 'FanProjection2D' OpKernel for CPU devices compatible with node {{node FanProjection2D}} = FanProjection2D[central_ray_vectors=Tensor<type: float shape: [90,2] values: [1 0][0.997564077...]...>, detector_origin=Tensor<type: float shape: [1] values: -629.732544>, detector_spacing=Tensor<type: float shape: [1] values: 1.73003447>, projection_shape=[90,729], source_2_detector_distance=2000, source_2_isocenter_distance=1000, volume_origin=Tensor<type: float shape: [2] values: -255.5 -255.5>, volume_shape=[512,512], volume_spacing=Tensor<type: float shape: [2] values: 1 1>](ones_1)
>         .  Registered:  device='GPU'

For reference, the layers are constructed by:

> def SART(sino, geometry, dx, numpix, numits, D, M, current_x, eps, batch_size):
>     for iteration in range(numits):
>         print("ART Layer number ", iteration)
>         with tf.variable_scope('ART_layer{}'.format(iteration)):
>             # print("CURRENT_X SHAPE ========", current_x)
>     
>             if isinstance(geometry,GeometryParallel2D):
>                 fp = projection_2d.parallel_projection2d(current_x, geometry)
>             else:
>                 fp = projection_2d.fan_projection2d(current_x, geometry)
>                 print("FP SHOULD PRINT AFTER THIS")
>                 # print(fp)
>                 
>             diff = tf.subtract(tf.multiply(fp,dx), sino, name="diff")
>             print("diff ========", diff)
>             tmp1 = tf.divide(diff, M, name="tmp1")
>             # print("tmp1 ========", tmp1)
>             
>             if isinstance(geometry,GeometryParallel2D):
>                 bp = backprojection_2d.parallel_backprojection2d(tmp1, geometry)
>                 # print("bp ========", bp)
>             else:
>                 bp = backprojection_2d.fan_backprojection2d(tmp1, geometry)
>                 # print("bp ========", bp)
>                 
>             ind2 = tf.greater(tf.abs(bp), 1e3, name="ind2")
>             # print("ind2 ========", ind2)
>             zeros = tf.zeros_like(bp, name="zeros")
>             bp = tf.where(ind2,zeros,bp, name="bp")
>             # print("bp ========", bp)
>             tmp2 = tf.divide(bp, D, name="tmp2")
>             # print("tmp2 ========", tmp2) 
>             current_x = tf.math.subtract(current_x, tmp2, name="current_x_minus_tmp2")
>             # print("CURRENT X AFTER SUBTRACT", current_x)       
>         out = cnn_layers(current_x, numpix, batch_size,iteration)         
>         current_x = tf.math.subtract(out, current_x, name="Reunion") # Need to take in numpix[0]
>         current_x = tf.maximum(current_x,eps, name="Maximum")                 #set any negative values to small positive value
> 
>     # print("RETURNING HOME")
>     return current_x

And in there, cnn_layers is:

> def cnn_layers(user_input, numpix, batch_size, iteration, is_training=True, output_channels=1):
>     print("CNN Layer Number ", iteration) 
>     with tf.variable_scope('CNN_layer{}'.format(iteration)):
>         user_input = tf.reshape(user_input,[batch_size, numpix[0], numpix[0], output_channels])
>         print(user_input)
>         output = tf.layers.conv2d(user_input, 64, 3, padding='same', activation=tf.nn.relu)
>         output = tf.layers.conv2d(output, 64, 3, padding='same', name='conv2', use_bias=False)
>         output = tf.nn.relu(tf.layers.batch_normalization(output, training=is_training))
>         output = tf.layers.conv2d(output, output_channels, 3, padding='same')
>     return output

We know that shape is lost at fp = projection_2d.fan_projection2d(current_x, geometry), and that the bug reported above is thrown later in the code when the line optimizer = tf.train.AdamOptimizer(self.lr, name='AdamOptimizer') is called. It seems like this may be similar to the issue in https://github.com/csyben/PYRO-NN-Layers/issues/3.

We have a few ideas of what may be causing it, but we are unsure.

  1. If pyroNN implicitly calculates the gradients inside of the projection and backprojection files, and the return value is a list, is it still compatible with tensorflow placeholders? After passing a tensor through the fan back/forward projection operators our tensor loses it's shape.
  2. Projections (both forward and backwards) are OpNodes in the gradients section of our tensorboard, could that be an issue?
  3. In general, is the current version of PyroNN layers (master branch) compatible with convolutional layers in a tensorflow graph? We were particulaly lookinng a this Issue: https://github.com/csyben/PYRO-NN-Layers/issues/3
csyben commented 5 years ago

Hi,

thanks for your detailed description. Regarding your 3 Points:

  1. PYRO-NN wrap the PYRO-NN-Layers such that the respective operator is called for the gradient calculation. This means, if you take the parallel_backprojection2D the parallel_projection2D is automatically registered as the gradient computation operator. The results are returned as tensorflow tensors with the current context. They loose their shape as the shape interference function is missing on the C++ level. So yes it is still compatible with tensorflow placeholders. I have tested the Shape function behavior in the dev-branch. The changes are patched to the Master-branch and also an release with a new pre-built binary is provided (0.0.4)

  2. That should be fine, the problems are based on the missing shape interference and the missing handling of the batch dimension itself.

  3. That issue is probably the root for your problems. As the shape interference and batch handling is tested in the Dev-branch, the changes are patched in the Master-branch now. Also a new pre-built binary is support (release 0.0.4).

Can you test your code with the new version of the pyro-nn-layers 0.0.4 and give feedback here if the errors are gone ? Note: With the new release the operators requires the batch_size stored in the first dimension of the Tensorflow tensor. Even if the current layers are only supporting a batch_size == 1 the shape of your 2D input tensor should be [batch_size, height,width] with batch_size == 1.

i am looking forward hearing from you whether it works or not.

pgmoka commented 5 years ago

Hello,

We have implemented version 0.04, and made considerable progress. We are implementing another algorithm, called LEARN, that we have been able to implement with the version.

However, SART is still generating an error. This is it:

> Traceback (most recent call last):
>   File "main.py", line 244, in <module>
>     tf.app.run()
>   File "/home/NETID/pgmoka/anaconda3/envs/updated_pyro_0.04/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
>     _sys.exit(main(argv))
>   File "main.py", line 221, in main
>     batch_size, ckpt_dir, num_epochs, lr, sample_dir)
>   File "/data/pgmoka/CT_Image_Reconstruction/SART_and_LEARN/phase_2/model.py", line 398, in train
>     summary_writer=writer, summ_img=img)  # eval_data value range is 0-255
>   File "/data/pgmoka/CT_Image_Reconstruction/SART_and_LEARN/phase_2/model.py", line 362, in evaluate
>     clean_image, noisy_image, output_clean_image)
>   File "/data/pgmoka/CT_Image_Reconstruction/SART_and_LEARN/phase_2/utils.py", line 112, in save_images
>     cat_image = np.concatenate([ground_truth, noisy_image, clean_image], axis=1)
>   File "<__array_function__ internals>", line 6, in concatenate
> ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 512 and the array at index 1 has size 90

This might be the reason the bug above. The shapes seem to be correct, but there is still something incorrect. For more context, the method that makes the call is this:

> def rncnn(sino, numpix, numbin, numtheta, geometry, dx, numits, strat, batch_size, is_training=True, output_channels=1):
>     if(strat == 'SART'):
>         D, M, x_initial, eps = SART_variables(sino ,numpix, numbin, numtheta, geometry, dx, numits,batch_size)
>         return SART(sino, geometry, dx, numpix, numits, D, M, x_initial, eps, batch_size)
>         # D, M, x_initial, eps
>     else:
>         return LEARN(sino ,numpix, numbin, numtheta, geometry, dx, numits,batch_size)

The LEARN algorithm is his

> def LEARN(sino, numpix, numbin, numtheta, geometry, dx, numits,batch_size):
>     print("START BUILD PHASE")
>     sino = tf.reshape(sino, [1,numtheta, numbin])
>     eps = np.finfo(float).eps        
>     
>     #Set tensor for generation
>     config = tf.ConfigProto()
>     config.gpu_options.per_process_gpu_memory_fraction = 0.5
>     config.gpu_options.allow_growth = True
> 
>     
>     #x_initial = tf.convert_to_tensor(x_initial, np.float32)
>     # x^(k+1)=x^k- A^t (Ax^k-b)-conv(x^k)
>     current_x = tf.zeros([1,numpix[0],numpix[0]])
>     for iteration in range(numits):
>         print("============ LEARN iteration start ============ At iteration",iteration)
>         print("- current_x at beggining ",current_x)
>         # conv(x^k)
>         cnn_ret_val = cnn_layers(current_x, numpix, batch_size,iteration)
>         print()
>         print("- cnn_ret_val at beggining ",cnn_ret_val) 
>         with tf.variable_scope('LEARN_layer{}'.format(iteration)):
>             
>             #if(True):
>                 # tf.train.AdamOptimizer(
>             # (Ax^k-b)
>             if isinstance(geometry,GeometryParallel2D):
>                 fp = projection_2d.parallel_projection2d(current_x, geometry)
>             else:
>                 fp = projection_2d.fan_projection2d(current_x, geometry)
>                 # fp = tf.reshape(fp, [numtheta, numbin])
>             print() 
>             print("- fp ",fp)
>             mult = tf.multiply(fp,dx)
>             mult = tf.reshape(mult, [1,numtheta, numbin])
>             print()
>             print("- mult ",mult)
>             # print("mult before subtraction: ")
>             # print(mult.eval())
>             tmp1 = tf.math.subtract(mult, sino)
>             print()
>             print("- tmp1 ",tmp1)  
>             # A^t (Ax^k-b)
>             if isinstance(geometry,GeometryParallel2D):
>                 bp = backprojection_2d.parallel_backprojection2d(tmp1, geometry)
>             else:
>                 bp = backprojection_2d.fan_backprojection2d(tmp1, geometry)
>             print()
>             print("- bp ",bp)            
>             ind2 = tf.greater(tf.abs(bp), 1e3) 
>             print()
>             print("- ind2 ",ind2)    
>             zeros = tf.zeros_like(bp)
>             tmp2 = tf.where(ind2,zeros,bp)
>             print()
>             print("- tmp2 ",tmp2)    
>             # x^k- A^t (Ax^k-b)
>              
>             # current_x = tf.maximum(current_x,eps)                  #set any negative values to small positive value
>             # current_x = tf.math.subtract(current_x, cnn_ret_val) 
> 
>             current_x = tf.math.subtract(tmp2, cnn_ret_val)  
>             print()              
>             print("- current_x at end ",current_x)
>             print()
>             # current_x = tf.maximum(current_x,eps);
>             #self.lr = 1/10e4
> 
>             # for k in range(self.num_epochs):
>         print("============ LEARN iteration end ============ ")
>     print("END BUILD PHASE")
>     current_x = tf.reshape(current_x,[numpix[0], numpix[0]])
>     return current_x

And the method that creates M, D, x_initial and eps is this:

> def SART_variables(sino, numpix, numbin, numtheta, geometry, dx, numits,batch_size):
>     eps = np.finfo(float).eps        
>     
>     #Set tensor for generation
>     config = tf.ConfigProto()
>     config.gpu_options.per_process_gpu_memory_fraction = 0.5
>     config.gpu_options.allow_growth = True
>     
>     # Removes sess
>     temp_D = tf.ones([1,numtheta, numbin])
>     temp_M = tf.ones([1,numpix[0],numpix[0]])
>     print(tf.maximum(temp_M,1))
>     if isinstance(geometry,GeometryParallel2D):
>         M = projection_2d.parallel_projection2d(temp_M, geometry)
>         D = backprojection_2d.parallel_backprojection2d(temp_D, geometry)
>     else:
>         M = projection_2d.fan_projection2d(temp_M, geometry)
>         D = backprojection_2d.fan_backprojection2d(temp_D, geometry)
> 
>     D = tf.maximum(D,eps)
>     M = tf.maximum(tf.multiply(M,dx),eps)
>     
>     x_initial = tf.zeros([1,numpix[0],numpix[0]])
>     # x_initial = tf.zeros(len(numpix[0]))
>     #x_initial = tf.convert_to_tensor(x_initial, np.float32)
> 
>     return D, M, x_initial, eps

We have checked the shapes of D, M, x_initial, and eps and they seemed to be correct, so this is still a bug we are processing, but LEARN shows 0.04 is functioning. The error is something else inside of SART.

Thank you for the response.

csyben commented 4 years ago

Could you check if the error is still there in the new 0.1.0 update ? With the eager execution debugging should be easier.

csyben commented 3 years ago

closed due to inactivity