ValueError: Unable to retrieve parameter 'w' when trying to use `eval_inference`

jeremyrcoyle commented 1 year ago

When invoking experiment.py to do inference:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=fixed10.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=20

I get the following error:

Traceback (most recent call last):
  File "./tapnet/experiment.py", line 431, in <module>
    app.run(main)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 424, in main
    platform.main(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 405, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 370, in evaluate
    self._eval_inference(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 981, in _eval_inference
    outputs, _ = self._infer_batch(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 440, in _infer_batch
    output, _ = functools.partial(wrapped_forward_fn, input_key=input_key)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 125, in forward
    return self.point_prediction.forward_fn(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 150, in forward_fn
    return shared_modules[self.model_key](
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/tapnet_model.py", line 215, in __call__
    latent = self.tsm_resnet(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

Attempting to use a local GPU. The live_demo.py script works for me, so not sure what the issue is here.

cdoersch commented 1 year ago

live_demo.py uses a TAPIR model, but it looks like you're using a TAP-Net config. What checkpoint file are you trying to use with that code? Did you perhaps intend to use a TAPIR config?

jeremyrcoyle commented 1 year ago

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

jeremyrcoyle commented 1 year ago

Please let me know if there is any more info I can provide or debugging steps on my end.

cdoersch commented 1 year ago

Is it an option to use the code snippet we provide for inference in the colab?

Has the error message changed from using the tapir config? The traceback you provided above has tap_net/ as the prefix for the variable names; it should be tapir if you're actually running a tapir model. Without more information it's difficult to guess why that's happening.

wenshengyoung commented 1 year ago

Did you solve the problem, I had the same thing happen to me.

ldg810 commented 1 year ago

I am getting same error... Is there any solution??

ldg810 commented 1 year ago

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.


wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

cdalinghaus commented 1 year ago

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.
wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

nutsintheshell commented 9 months ago

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.
wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy
Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

I try your method.But another error occurs: Traceback (most recent call last): File "/home/jishengyin/anaconda3/envs/tapnet/lib/python3.10/site-packages/numpy/lib/npyio.py", line 465, in load return pickle.load(fid, **pickle_kwargs) _pickle.UnpicklingError: invalid load key, '-'.

nutsintheshell commented 9 months ago

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.
wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy
Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

I would like to evaluate a model. evaluation means first train a model and then evaluate it in evaluation dataset.(maybe).It means it doesn't need a pretrained model.So I can't understand why I got the error.

google-deepmind / tapnet

ValueError: Unable to retrieve parameter 'w' when trying to use `eval_inference` #38