RenYang-home / OpenDVC

BSD 3-Clause "New" or "Revised" License
214 stars 37 forks source link

Issue when running training code #9

Closed Adithya-MN closed 3 years ago

Adithya-MN commented 3 years ago

Firstly, I'd like to that you for sharing this opensource version of DVC!

I was able to run through the inference on the BasketBall pass dataset, however, I ran into some issues while trying to train the model using the train instructions given

python OpenDVC_train_PSNR.py --l 1024

I get the following error -

2021-02-14 07:13:48.321268: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-02-14 07:13:48.337753: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-14 07:14:06.276314: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at mkl_slice_op.cc:303 : Aborted: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300

I am currently using TF 1.12 and other libraries as specified in the run instructions and have installed a compatible version of MKL and MKLDNN.

Would you have any insights on what might be causing this issue? Any help would be really appreciated!

RenYang-home commented 3 years ago

I am not so sure why the error occurs, but the inference and training codes should depend on the same packages. If the inference is able to run, it seems that the environment is correct. Are you training on CPU or GPU (what kind of GPU?)? Maybe could try to train on a better device. Also, please paste the whole log so that we could look into it better.

Adithya-MN commented 3 years ago

Thank you for the fast response! I'm running the codes on GPU in a DGX system. This has 4 32 GB V100s, so hopefully, the device is not an issue.

Here's the full error log -


(RLVC) csr_guest@csr-dgx1-03:~/aniranja/RLVC/OpenDVC$ python OpenDVC_train_PSNR.py --l 1024
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-02-15 23:28:49.746333: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-02-15 23:28:49.763038: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-15 23:29:07.956119: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at mkl_slice_op.cc:303 : Aborted: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
Traceback (most recent call last):
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[{{node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2}} = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "OpenDVC_train_PSNR.py", line 166, in <module>
    learning_rate: lr})
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2 (defined at OpenDVC_train_PSNR.py:102)  = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]

Caused by op 'gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2', defined at:
  File "OpenDVC_train_PSNR.py", line 102, in <module>
    train_MV = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(train_loss_MV, global_step=step)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
    grad_loss=grad_loss)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 222, in _ConcatGradV2
    op, grad, start_value_index=0, end_value_index=-1, dim_index=-1)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 150, in _ConcatGradHelper
    out_grads.append(array_ops.slice(grad, begin, size))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 578, in slice
    return gen_array_ops._slice(input_, begin, size, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7466, in _slice
    "Slice", input=input, begin=begin, size=size, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'flow_motion/flow_cnn_4/concat', defined at:
  File "OpenDVC_train_PSNR.py", line 45, in <module>
    flow_tensor, _, _, _, _, _ = motion.optical_flow(Y0_com, Y1_raw, batch_size, Height, Width)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 54, in optical_flow
    loss_4, flow_4 = loss(flow_3, im1_4, im2_4, 4)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 27, in loss
    res = convnet(im1_warped, im2, flow, layer)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 7, in convnet
    input = tf.concat([im1_warp, im2, flow], axis=-1)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1124, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1033, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

AbortedError (see above for traceback): Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2 (defined at OpenDVC_train_PSNR.py:102)  = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]

Same after some cleanup and removal of deprecation errors,


2021-02-15 23:29:07.956119: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at mkl_slice_op.cc:303 : Aborted: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
Traceback (most recent call last):
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[{{node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2}} = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "OpenDVC_train_PSNR.py", line 166, in <module>
    learning_rate: lr})
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.AbortedError: Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2 (defined at OpenDVC_train_PSNR.py:102)  = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]

Caused by op 'gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2', defined at:
  File "OpenDVC_train_PSNR.py", line 102, in <module>
    train_MV = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(train_loss_MV, global_step=step)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
    grad_loss=grad_loss)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 222, in _ConcatGradV2
    op, grad, start_value_index=0, end_value_index=-1, dim_index=-1)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 150, in _ConcatGradHelper
    out_grads.append(array_ops.slice(grad, begin, size))
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 578, in slice
    return gen_array_ops._slice(input_, begin, size, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7466, in _slice
    "Slice", input=input, begin=begin, size=size, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'flow_motion/flow_cnn_4/concat', defined at:
  File "OpenDVC_train_PSNR.py", line 45, in <module>
    flow_tensor, _, _, _, _, _ = motion.optical_flow(Y0_com, Y1_raw, batch_size, Height, Width)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 54, in optical_flow
    loss_4, flow_4 = loss(flow_3, im1_4, im2_4, 4)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 27, in loss
    res = convnet(im1_warped, im2, flow, layer)
  File "/home/csr_guest/aniranja/RLVC/OpenDVC/motion.py", line 7, in convnet
    input = tf.concat([im1_warp, im2, flow], axis=-1)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1124, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1033, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/csr_guest/anaconda3/envs/RLVC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

AbortedError (see above for traceback): Operation received an exception:Status: 5, message: could not create a view primitive descriptor, in file tensorflow/core/kernels/mkl_slice_op.cc:300
         [[node gradients/flow_motion/flow_cnn_4/concat_grad/Slice_2 (defined at OpenDVC_train_PSNR.py:102)  = _MklSlice[Index=DT_INT32, T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput, ConstantFolding/gradients/flow_motion/flow_cnn_1/concat_grad/ConcatOffset-folded-2, gradients/flow_motion/flow_cnn_4/concat_grad/Shape_2, gradients/flow_motion/flow_cnn_4/conv2d/Conv2D_grad/Conv2DBackpropInput:1, DMT/_476, DMT/_477)]]
RenYang-home commented 3 years ago

I am sorry that I currently have no idea what the problem is, as our hardware is different and I have never met that issue. Could you try to train a simpler network to see if the problem still exists, e.g., just keep the motion_flow function and optimize the MSE between warped and raw images?

just keep

flowtensor, , , , , = motion.optical_flow(Y0_com, Y1_raw, batch_size, Height, Width) Y1_warp_0 = tf.contrib.image.dense_image_warp(Y0_com, flow_tensor)

only try to minimize

tf.reduce_mean(tf.squared_difference(Y1_warp_0, Y1_raw))

and see if workable?

Adithya-MN commented 3 years ago

Resolved this issue with Tensorflow 1.14 and tensorflow-compression 1.2 I installed tfc using pip rather than using a folder in the same path. This seems to have fixed the error.

Thank you for helping :)

Are there any plans to release RLVC soon/ would you have any guidelines for replicating the training loop, now that the work is accepted?

RenYang-home commented 3 years ago

Happy to hear that the problem is solved. The RLVC has been released at https://github.com/RenYang-home/RLVC, the training could follow the description in our paper, we may not release the training codes. Thanks for understanding.

Adithya-MN commented 3 years ago

Okay - thanks for the update. One last question - I noticed you had mentioned we could get the training code in HLVC using substitution in OpenDVC in the discussion thread here.. Would a similar substitution/modification be possible for RLVC? - just wanted your thoughts

RenYang-home commented 3 years ago

Since the RLVC contains a recurrent framework, its training is more complicated than OpenDVC/HLVC. It needs to be first warmed by training on one time step, and then fine-tuned to a longer time length, so its training procedure cannot be directly borrowed from OpenDVC, but in the end, the general ideas for training them are almost the same, i.e., a progressive training, flow -> flow + motion compression -> then + MC -> then + residual compression -> then fine-tune to a longer time length.

Adithya-MN commented 3 years ago

Right - that makes sense. Thanks once again for all the responses!