MiguelMonteiro / permutohedral_lattice

Permutohedral Lattice C++/CUDA implementation + TensorFlow Op (CPU/GPU)
83 stars 18 forks source link

one issue when compiling #11

Closed gyheart closed 5 years ago

gyheart commented 5 years ago

Hello! I use ubuntu16.04. python3.5. tensorflow1.4 My version g++ 5 and cuda 8.0.

At the beginning of compiling, there are several information show that both tow compiler is work well. All the errors occur after 50% are compile task is done and i see this [50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterkernel.cu.o

Scanning dependencies of target lattice_filter [ 25%] Building CXX object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cpp.o [ 50%] Building CUDA object CMakeFiles/lattice_filter.dir/src/LatticeFilterKernel.cu.o

MiguelMonteiro commented 5 years ago

Hello,

It seems that the .cu file is being included more than once. I haven't compiled this project in a while (but nothing should have changed), did you do something with the directories and files or just try to compile as is?

gyheart commented 5 years ago

Hello,

It seems that the .cu file is being included more than once. I haven't compiled this project in a while (but nothing should have changed), did you do something with the directories and files or just try to compile as is?

Thanks for your reply. It is my error. I have solved it.

But I have a new problem when I use your CRFasRNN after FCN by keras. The unput shape is 256*256.

First I set the batch size=1, and define the shape of the inputs before the crf as rnn layer. The code can work successfully.

Then I set the batch size=10, and define the shape of the inputs before the crf as rnn layer. But the code can't work. It have a error behind:

2018-12-05 10:39:39.091356: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] 2018-12-05 10:39:39.094278: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] 2018-12-05 10:39:39.094375: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] Traceback (most recent call last): File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] [[Node: crfrnn/spatial_ker_weights/read/_735 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_866_crfrnn/spatial_ker_weights/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "qq_train.py", line 85, in shuffle=True) File "/home/amax/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, kwargs) File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 2177, in fit_generator class_weight=class_weight) File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1849, in train_on_batch outputs = self.train_function(ins) File "/home/amax/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2475, in call self.session_kwargs) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] [[Node: crfrnn/spatial_ker_weights/read/_735 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_866_crfrnn/spatial_ker_weights/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs', defined at: File "qq_train.py", line 85, in shuffle=True) File "/home/amax/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, *kwargs) File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 2026, in fit_generator self._make_train_function() File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 970, in _make_train_function loss=self.total_loss) File "/home/amax/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(args, *kwargs) File "/home/amax/.local/lib/python3.5/site-packages/keras/optimizers.py", line 434, in get_updates grads = self.get_gradients(loss, params) File "/home/amax/.local/lib/python3.5/site-packages/keras/optimizers.py", line 78, in get_gradients grads = K.gradients(loss, params) File "/home/amax/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2512, in gradients return tf.gradients(loss, variables, colocate_gradients_with_ops=True) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile return grad_fn() # Exit early File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in grad_scope, op, func_call, lambda: grad_fn(op, *out_grads)) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 742, in _MulGrad rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 532, in _broadcast_gradient_args "BroadcastGradientArgs", s0=s0, s1=s1, name=name) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op 'loss/crfrnn_loss/mul_1', defined at: File "qq_train.py", line 46, in lr_init=lr_init, vgg_weight_path=vgg_path, weight_decay=0.0001) File "/home/amax/gy/keras_fc/fcncrf/qq_fcn.py", line 318, in fcn_8s metrics=[dice_coef, f1_score]) File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 827, in compile sample_weight, mask) File "/home/amax/.local/lib/python3.5/site-packages/keras/engine/training.py", line 442, in weighted score_array *= weights File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper return func(x, y, name=name) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch return gen_math_ops._mul(x, y, name=name) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul "Mul", x=x, y=y, name=name) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/home/amax/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [655360] vs. [10] [[Node: training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/crfrnn_loss/mul_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape, training/Adam/gradients/loss/crfrnn_loss/mul_1_grad/Shape_1)]] [[Node: crfrnn/spatial_ker_weights/read/_735 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_866_crfrnn/spatial_ker_weights/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

MiguelMonteiro commented 5 years ago

The input shape of the CRFasRNN layer should be [batch_size, spatial_dim_1, spatial_dim_2, ..., spatial_dim_m, num_classes].

gyheart commented 5 years ago

The input shape of the CRFasRNN layer should be [batch_size, spatial_dim_1, spatial_dim_2, ..., spatial_dim_m, num_classes].

My input shape of the CRFasRNN layer is 10, 256, 256, 2 and 10, 256, 256, 3 .

MiguelMonteiro commented 5 years ago

have you compiled for the correct number of dimensions and channels?

gyheart commented 5 years ago

I set PATIAL_DIMS=2, INPUT_CHANNELS=2 (object/background) and REFERENCE_CHANNELS=3(RGB). When the batch size is 1, the code can work successfully.

MiguelMonteiro commented 5 years ago

Did you manage to fix it? What was the problem?

gyheart commented 5 years ago

Did you manage to fix it? What was the problem?

I have managed it. I made a stupid mistake in my code. Thank you for your help.

But It doesn't seem that the CRF layer effect the output. The out by FCN + your CRF and FCN were (visually) identical! I set the theta_alpha = 160, theta_beta = 3, theta_gamma = 3 like that the original setting. And the reference image is to be within [0, 255] .

How can I use your CRF layer to let it produce the same effect as the original version.

MiguelMonteiro commented 5 years ago

Have you trained the algorithm from scratch?

gyheart commented 5 years ago

Have you trained the algorithm from scratch?

Yes, I trained the algorithm from scratch. Is the theta setting error?

MiguelMonteiro commented 5 years ago

What was the final values for the compatibility matrix and weights of the CRFasrRNN layer? Comment on the thetas, if theta is very large (e.g. 160) this is the same as bypassing the filter. Also you can try normalizing the images between 0 and 1.