After training testing code gives error like "r2n/Reshape_9:0 is missing

Ajithbalakrishnan commented 5 years ago

Hi, I have tried retraining of pre-trained network and training from scratch using the training code that you have given. After 400 epoches i have saved the model of both. But i got same error in both model while testing. error was in below line

Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_9:0")

r2n/Reshape_9 tensor was not there. I have confirmed it by TensorBoard. But while checking the graph generated by pre-trained network which is downloaded shows the same tensor. How it is possible??

One interesting thing is that , I changed the code line as below

Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_7:0")

And then tested with my trained network. It give output. Because after training myself r2n block takes 323232 sized voxels from r2n/Reshape_7:0.

Can you explain why it is so???

Yang7879 commented 5 years ago

@Ajithbalakrishnan sorry for the delay. Have you solved it? Which version tf do u use? seems the default name for the tensor is changed.

Ajithbalakrishnan commented 5 years ago

Thanks for the reply sir. yes i solved it. After training from scratch i took voxel output ( Y_pred ) from "r2n/Reshape_7:0" tensor.

One more doubt is that in this paper you said the loss function is IOU. But in the implementation code it seems like you have used cross entropy loss which is described in the base paper 3D-R2N2. why it is so?

I got similar loss functions listed below 1.IOU 2.Chamfer distance 3.Cross entropy

Earth movers distance ( https://arxiv.org/abs/1612.00603 ) how can i select a good loss function from this?

Yang7879 commented 5 years ago

@Ajithbalakrishnan As to voxel prediction, the cross-entropy loss is usually used for this binary classification problem.

Ajithbalakrishnan commented 5 years ago

Thanks , I got it. Can you please share the code for finding the IOU ? I have tried with the below code for IOU . But it got an error InvalidArgumentError (see above for traceback): assertion failed: [labels out of bound] [Condition x < y did not hold element-wise:] [x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [1]

Code - import tensorflow as tf import os import sys sys.path.append('..') import tools as tools import numpy as np import matplotlib.pyplot as plt import pandas as pd GPU='0'

vox_res = 32

def load_real_rgbs(test_mv=3): obj_rgbs_folder ='./Data_sample/amazon_real_rgbs/lamp/' rgbs = [] rgbs_views = sorted(os.listdir(obj_rgbs_folder)) for v in rgbs_views: if not v.endswith('png'): continue rgbs.append(tools.Data.load_single_X_rgb_r2n2(obj_rgbs_folder + v, train=False)) rgbs = np.asarray(rgbs) x_sample = rgbs[0:test_mv, :, :, :].reshape(1, test_mv, 127, 127, 3) return x_sample, None

def load_shapenet_rgbs(test_mv=3): obj_rgbs_folder = './Data_sample/ShapeNetRendering/03001627/1a6f615e8b1b5ae4dbbc9440457e303e/rendering/' obj_gt_vox_path ='./Data_sample/ShapeNetVox32/03001627/1a6f615e8b1b5ae4dbbc9440457e303e/model.binvox' rgbs=[] rgbs_views = sorted(os.listdir(obj_rgbs_folder)) for v in rgbs_views: if not v.endswith('png'): continue rgbs.append(tools.Data.load_single_X_rgb_r2n2(obj_rgbs_folder + v, train=False)) rgbs = np.asarray(rgbs) x_sample = rgbs[0:test_mv, :, :, :].reshape(1, test_mv, 127, 127, 3) y_true = tools.Data.load_single_Y_vox(obj_gt_vox_path) ######################################### Y_true_vox = [] Y_true_vox.append(y_true) Y_true_vox = np.asarray(Y_true_vox) return x_sample, Y_true_vox ######################################### def ttest_demo(): model_path = './Model_released/' if not os.path.isfile(model_path + 'model.cptk.data-00000-of-00001'): print ('please download our released model first!') return

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.visible_device_list = GPU
with tf.Session(config=config) as sess:
    saver = tf.train.import_meta_graph(model_path + 'model.cptk.meta', clear_devices=True)
    saver.restore(sess, model_path + 'model.cptk')
    print ('model restored!')

    X = tf.get_default_graph().get_tensor_by_name("Placeholder:0")
    Y_pred = tf.get_default_graph().get_tensor_by_name("r2n/Reshape_9:0")

    x_sample, gt_vox = load_shapenet_rgbs()

### IOU #########################################################
    gt_vox=gt_vox.astype(np.float64)

    Y_vox_ = tf.reshape(gt_vox, shape=[-1, vox_res ** 3,1])
    Y_pred_ = tf.reshape(Y_pred, shape=[-1, vox_res ** 3,1])
    iou = tf.metrics.mean_iou(labels=Y_vox_,predictions=Y_pred_,num_classes=1)
    sess.run(tf.local_variables_initializer())

     #########################################################
     ## session run
    y_pred,recon_loss,iou_value = sess.run([Y_pred, rec_loss,iou], feed_dict={X: x_sample})                              
    print("IOU :",iou_value)                                          
     ###### to visualize
th = 0.25
y_pred[y_pred>=th]=1
y_pred[y_pred<th]=0
tools.Data.plotFromVoxels(np.reshape(y_pred,[32,32,32]), title='y_pred')
if gt_vox is not None:
    tools.Data.plotFromVoxels(np.reshape(gt_vox,[32,32,32]), title='y_true')
from matplotlib.pyplot import show
show()
    #########################

if name == 'main': ttest_demo()

Yang7879 commented 5 years ago

@Ajithbalakrishnan Here's the script for IoU calculation.

def metric_IoU(batch_voxel_occup_pred, batch_voxel_occup_true):
    batch_voxel_occup_pred_ = copy.deepcopy(batch_voxel_occup_pred)
    batch_voxel_occup_pred_[batch_voxel_occup_pred_ >= 0.5] = 1
    batch_voxel_occup_pred_[batch_voxel_occup_pred_ < 0.5] = 0
    I = batch_voxel_occup_pred_ * batch_voxel_occup_true
    U = batch_voxel_occup_pred_ + batch_voxel_occup_true
    U[U < 1] = 0
    U[U >= 1] = 1
    iou = np.sum(I) * 1.0 / np.sum(U) * 1.0
    return iou

Ajithbalakrishnan commented 5 years ago

Thank you . It Works. Instead of cross-entropy have you ever tried with any other loss functions like, 1.Earth movers distance ( https://arxiv.org/abs/1612.00603)
2.Mean squared false cross entropy loss ( MSFCEL) (https://arxiv.org/abs/1804.06375) or any other...... Can you please share your opinion?

Yang7879 commented 5 years ago

@Ajithbalakrishnan sorry, I didn't try it.

Ajithbalakrishnan commented 5 years ago

Ok Thanks. Till now i only tried to train from scratch. But, While retrain the released model (uncommented the line in main_attsets.py) i got an error... Its given below.

total weights: 52590114 2019-06-12 04:38:53.597409: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX FMA 2019-06-12 04:38:53.597951: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance. restoring saved model! 2019-06-12 04:38:57.550933: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key beta1_power_2 not found in checkpoint Traceback (most recent call last): File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: Key beta1_power_2 not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1538, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run run_metadata_ptr) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run feed_dict_tensor, options, run_metadata) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run run_metadata) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: Key beta1_power_2 not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "main_AttSets.py", line 475, in net.build_graph() File "main_AttSets.py", line 378, in build_graph self.saver = tf.train.Saver(max_to_keep=1) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init self.build() File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build build_save=build_save, build_restore=build_restore) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal restore_sequentially, reshape) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps restore_sequentially) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 854, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op op_def=op_def) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key beta1_power_2 not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1548, in restore names_to_keys = object_graph_key_mapping(save_path) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1822, in object_graph_key_mapping checkpointable.OBJECT_GRAPH_PROTO_KEY) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 359, in get_tensor status) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 526, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main_AttSets.py", line 475, in net.build_graph() File "main_AttSets.py", line 392, in build_graph self.saver.restore(self.sess, path + 'model.cptk') File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1554, in restore err, "a Variable name or other graph key that is missing") tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_2 not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at: File "main_AttSets.py", line 475, in net.build_graph() File "main_AttSets.py", line 378, in build_graph self.saver = tf.train.Saver(max_to_keep=1) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init self.build() File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build build_save=build_save, build_restore=build_restore) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal restore_sequentially, reshape) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps restore_sequentially) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 854, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op op_def=op_def) File "/home/wiproec4/anaconda3/envs/attsets/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key beta1_power_2 not found in checkpoint [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Yang7879 / AttSets

After training testing code gives error like "r2n/Reshape_9:0 is missing #4