I'm trying to evaluate TAP-Net on the Kubric dataset, and I'm getting the error shown below. I am running the following script: python3 ./tapnet/experiment.py --config=./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=/data3/tap/tap/tapnet_checkpoint/.
Any idea how to fix this? Thanks!
I0228 19:56:23.706737 140398657533120 train.py:152] Evaluating with config:
best_model_eval_metric: ''
best_model_eval_metric_higher_is_better: true
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
checkpoint_interval_type: null
dataset_names: &id001 !!python/tuple
- kubric
eval_initial_weights: true
eval_modes: &id002 !!python/tuple
- eval_davis_points
- eval_jhmdb
- eval_robotics_points
- eval_kinetics_points
evaluate_every: 10000
experiment_kwargs:
config:
checkpoint_dir: /data3/tap//tap/tapnet_checkpoint/
datasets:
dataset_names: *id001
kubric_kwargs:
batch_dims: 8
shuffle_buffer_size: 128
train_size: !!python/tuple
- 256
- 256
davis_points_path: ''
eval_modes: *id002
evaluate_every: 10000
fast_variables: !!python/tuple []
inference:
input_video_path: ''
num_points: 20
output_video_path: ''
resize_height: 256
resize_width: 256
jhmdb_path: ''
optimizer:
adam_kwargs:
b1: 0.9
b2: 0.95
eps: 1.0e-08
base_lr: 0.002
cosine_decay_kwargs:
end_value: 0.0
init_value: 0.0
warmup_steps: 5000
max_norm: -1
optimizer: adam
schedule_type: cosine
weight_decay: 0.01
robotics_points_path: ''
save_final_checkpoint_as_npy: true
shared_modules:
shared_module_names: &id003 !!python/tuple
- tapnet_model
tapnet_model_kwargs: {}
supervised_point_prediction_kwargs:
prediction_algo: cost_volume_regressor
sweep_name: default_sweep
training:
n_training_steps: 100000
interval_type: secs
log_all_train_data: false
log_tensors_interval: 60
log_train_data_interval: 120.0
logging_interval_type: null
max_checkpoints_to_keep: 5
one_off_evaluate: false
random_mode_eval: same_host_same_device
random_mode_train: unique_host_unique_device
random_seed: 42
save_checkpoint_interval: 10
shared_module_names: *id003
train_checkpoint_all_hosts: false
training_steps: 100000
I0228 19:56:23.755014 140398657533120 xla_bridge.py:173] Remote TPU is not linked into jax; skipping remote TPU.
I0228 19:56:23.755347 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu_driver': Could not initialize backend 'tpu_driver'
I0228 19:56:24.424445 140398657533120 xla_bridge.py:357] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I0228 19:56:24.425570 140398657533120 xla_bridge.py:357] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I0228 19:56:28.558446 140398657533120 supervised_point_prediction.py:979] Saving videos to /data3/tap//tap/tapnet_checkpoint/eval_kubric/0
I0228 19:56:28.567507 140398657533120 dataset_info.py:565] Load dataset info from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:28.572742 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint8'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint8.
W0228 19:56:28.574135 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'uint16'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to uint16.
I0228 19:56:28.620231 140398657533120 dataset_info.py:654] Fields info.[splits] from disk and from code do not match. Keeping the one from code.
I0228 19:56:28.620935 140398657533120 dataset_builder.py:522] Reusing dataset movi_e (/data3/tap/kubric/movi_e/256x256/1.0.0)
W0228 19:56:28.622349 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622643 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'float32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to float32.
W0228 19:56:28.622788 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.622916 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623071 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int32'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int32.
W0228 19:56:28.623173 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623292 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623408 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.623907 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624111 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'string'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to object.
W0228 19:56:28.624262 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624418 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.624617 140398657533120 feature.py:64] `TensorInfo.dtype` is deprecated. Please change your code to use NumPy with the field `TensorInfo.np_dtype` or use TensorFlow with the field `TensorInfo.tf_dtype`.
W0228 19:56:28.625051 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'int64'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to int64.
W0228 19:56:28.625315 140398657533120 dtype_utils.py:43] You use TensorFlow DType <dtype: 'bool'> in tfds.features This will soon be deprecated in favor of NumPy DTypes. In the meantime it was converted to bool.
I0228 19:56:29.712036 140398657533120 logging_logger.py:49] Constructing tf.data.Dataset movi_e for split None, from /data3/tap/kubric/movi_e/256x256/1.0.0
W0228 19:56:32.420510 140398657533120 deprecation.py:337] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0228 19:56:39.477390 140398657533120 deprecation.py:541] From /data/anaconda3/envs/tapnet/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:1082: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
2023-02-28 19:56:44.292071: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
2023-02-28 19:56:45.191140: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] shape_optimizer failed: INVALID_ARGUMENT: Subshape must have computed start >= end since stride is negative, but is 1 and 3 (computed from start 1 and end 9223372036854775807 over shape with rank 3 and stride-1)
Traceback (most recent call last):
File "./tapnet/experiment.py", line 429, in <module>
app.run(main)
File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ubuntu/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "./tapnet/experiment.py", line 421, in main
platform.main(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
return f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
train.evaluate(experiment_class, config, checkpointer, writer,
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
return fn(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
evaluate_out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 404, in evaluate
eval_scalars = point_prediction_task.evaluate(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 514, in evaluate
self._eval_epoch(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 1005, in _eval_epoch
scalars, viz = eval_batch_fn(params, state, inputs, rng)
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 766, in _eval_batch
occlusion_logits, tracks, loss_scalars = self._infer_batch(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 577, in _infer_batch
output, _ = functools.partial(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
out = f(*args, **kwargs)
File "./tapnet/experiment.py", line 122, in forward
return self.point_prediction.forward_fn(
File "/home/ubuntu/contractive/tapnet/supervised_point_prediction.py", line 313, in forward_fn
return shared_modules['tapnet_model'](
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/ubuntu/contractive/tapnet/tapnet_model.py", line 341, in __call__
latent = self.tsm_resnet(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/home/ubuntu/contractive/tapnet/models/tsm_resnet.py", line 383, in __call__
net = hk.Conv2D(
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
out = f(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
return bound_method(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
return wrapped._current(*args, **kwargs)
File "/data/anaconda3/envs/tapnet/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.
I'm trying to evaluate TAP-Net on the Kubric dataset, and I'm getting the error shown below. I am running the following script:
python3 ./tapnet/experiment.py --config=./tapnet/configs/tapnet_config.py --jaxline_mode=eval_kubric --config.checkpoint_dir=/data3/tap/tap/tapnet_checkpoint/
.Any idea how to fix this? Thanks!