HorizonRobotics / alf

Agent Learning Framework https://alf.readthedocs.io
Apache License 2.0
298 stars 49 forks source link

ValueError: all input arrays must have the same shape #1094

Closed ipsec closed 2 years ago

ipsec commented 2 years ago

Hi,

I'm having an issue running CartPole-v1 gym game.

After some steps I'm receiving the error ValueError: all input arrays must have the same shape

I1129 13:53:48.921936 4383311232 policy_trainer.py:435] muzero_pendulum_conf.py -> cartpole: 82 time=0.804 throughput=3182.77
I1129 13:53:50.561201 4383311232 policy_trainer.py:435] muzero_pendulum_conf.py -> cartpole: 84 time=0.814 throughput=3145.37
I1129 13:53:52.200152 4383311232 policy_trainer.py:435] muzero_pendulum_conf.py -> cartpole: 86 time=0.817 throughput=3132.63
Do you want to save checkpoint? (y/n): y
I1129 13:54:02.451696 4383311232 checkpoint_utils.py:300] Checkpoint 'ckpt-89' is saved successfully.
I1129 13:54:02.451909 4383311232 parallel_environment.py:168] Closing all processes.
I1129 13:54:02.477777 4383311232 parallel_environment.py:171] All processes closed.
Traceback (most recent call last):
  File "/Users/fernando/miniforge3/envs/alf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/fernando/miniforge3/envs/alf/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/fernando/Documents/dev/projects/alf/alf/bin/train.py", line 234, in <module>
    app.run(main)
  File "/Users/fernando/miniforge3/envs/alf/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/Users/fernando/miniforge3/envs/alf/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/Users/fernando/Documents/dev/projects/alf/alf/bin/train.py", line 202, in main
    training_worker(
  File "/Users/fernando/Documents/dev/projects/alf/alf/bin/train.py", line 177, in training_worker
    raise e
  File "/Users/fernando/Documents/dev/projects/alf/alf/bin/train.py", line 171, in training_worker
    _train(root_dir, rank, world_size)
  File "/Users/fernando/Documents/dev/projects/alf/alf/bin/train.py", line 140, in _train
    trainer.train()
  File "/Users/fernando/Documents/dev/projects/alf/alf/trainers/policy_trainer.py", line 170, in train
    common.run_under_record_context(
  File "/Users/fernando/Documents/dev/projects/alf/alf/utils/common.py", line 292, in run_under_record_context
    func()
  File "/Users/fernando/Documents/dev/projects/alf/alf/trainers/policy_trainer.py", line 433, in _train
    train_steps = self._algorithm.train_iter()
  File "/Users/fernando/Documents/dev/projects/alf/alf/algorithms/rl_algorithm.py", line 619, in train_iter
    return self._train_iter_off_policy()
  File "/Users/fernando/Documents/dev/projects/alf/alf/algorithms/rl_algorithm.py", line 649, in _train_iter_off_policy
    experience = self.unroll(config.unroll_length)
  File "/Users/fernando/Documents/dev/projects/alf/alf/algorithms/rl_algorithm.py", line 506, in unroll
    return self._unroll(unroll_length)
  File "/Users/fernando/Documents/dev/projects/alf/alf/utils/common.py", line 983, in _func
    ret = func(*args, **kwargs)
  File "/Users/fernando/Documents/dev/projects/alf/alf/algorithms/rl_algorithm.py", line 558, in _unroll
    next_time_step = self._env.step(action)
  File "/Users/fernando/Documents/dev/projects/alf/alf/environments/alf_environment.py", line 209, in step
    self._current_time_step = self._step(action)
  File "/Users/fernando/Documents/dev/projects/alf/alf/environments/parallel_environment.py", line 164, in _step
    return self._stack_time_steps(time_steps)
  File "/Users/fernando/Documents/dev/projects/alf/alf/environments/parallel_environment.py", line 176, in _stack_time_steps
    stacked = nest.fast_map_structure_flatten(
  File "/Users/fernando/Documents/dev/projects/alf/alf/nest/nest.py", line 508, in fast_map_structure_flatten
    return pack_sequence_as(structure, [func(*x) for x in entries])
  File "/Users/fernando/Documents/dev/projects/alf/alf/nest/nest.py", line 508, in <listcomp>
    return pack_sequence_as(structure, [func(*x) for x in entries])
  File "/Users/fernando/Documents/dev/projects/alf/alf/environments/parallel_environment.py", line 177, in <lambda>
    lambda *arrays: numpy.stack(arrays),
  File "<__array_function__ internals>", line 5, in stack
  File "/Users/fernando/miniforge3/envs/alf/lib/python3.8/site-packages/numpy/core/shape_base.py", line 426, in stack
    raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape

I'm running CartPole with:

python -m alf.bin.train --conf muzero_pendulum_conf.py --root_dir ~/tmp/cartpole/ --conf_param create_environment.env_name="'CartPole-v1'" --conf_param SimpleMCTSModel.num_sampled_actions=None

A similar (I think because of RuntimeError: Different lengths!) error while running LunarLander-v2:

I1128 18:21:50.349225 4686296576 policy_trainer.py:442] muzero_pendulum_conf.py -> lunarlander: 1558 time=3.793 throughput=674.95
E1128 18:21:50.637818 4686296576 nest.py:78] pack_sequence_as() fails for TimeStep(step_type=TensorSpec(shape=(), dtype=torch.int32), reward=TensorSpec(shape=(), dtype=torch.float32), discount=BoundedTensorSpec(shape=(), dtype=torch.float32, minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), observation=BoundedTensorSpec(shape=(8,), dtype=torch.float32, minimum=array(-inf, dtype=float32), maximum=array(inf, dtype=float32)), prev_action=BoundedTensorSpec(shape=(), dtype=torch.int64, minimum=array(0), maximum=array(3)), env_id=TensorSpec(shape=(), dtype=torch.int32), untransformed=(), env_info={}) and [array([0.], dtype=float32), array([0], dtype=int32), array([ True]), array([[0.16114998, 0.00508255, 0.14029714, 0.02261066, 0.00377822,
        0.19899227, 0.        , 0.        ]], dtype=float32), array([2]), array([-11.06166], dtype=float32), array([2], dtype=int32)]. Error message: 'Different lengths! TimeStep(step_type=TensorSpec(shape=(), dtype=torch.int32), reward=TensorSpec(shape=(), dtype=torch.float32), discount=BoundedTensorSpec(shape=(), dtype=torch.float32, minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), observation=BoundedTensorSpec(shape=(8,), dtype=torch.float32, minimum=array(-inf, dtype=float32), maximum=array(inf, dtype=float32)), prev_action=BoundedTensorSpec(shape=(), dtype=torch.int64, minimum=array(0), maximum=array(3)), env_id=TensorSpec(shape=(), dtype=torch.int32), untransformed=(), env_info={}) <-> [array([0.], dtype=float32), array([0], dtype=int32), array([ True]), array([[0.16114998, 0.00508255, 0.14029714, 0.02261066, 0.00377822,
        0.19899227, 0.        , 0.        ]], dtype=float32), array([2]), array([-11.06166], dtype=float32), array([2], dtype=int32)]'
Do you want to save checkpoint? (y/n): n
I1129 13:13:19.872096 4686296576 parallel_environment.py:168] Closing all processes.
I1129 13:13:19.883817 4686296576 parallel_environment.py:171] All processes closed.
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/fernando/PycharmProjects/alf/alf/bin/train.py", line 234, in <module>
    app.run(main)
  File "/Users/fernando/PycharmProjects/alf/venv/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/Users/fernando/PycharmProjects/alf/venv/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/Users/fernando/PycharmProjects/alf/alf/bin/train.py", line 203, in main
    rank=0, world_size=1, conf_file=conf_file, root_dir=root_dir)
  File "/Users/fernando/PycharmProjects/alf/alf/bin/train.py", line 177, in training_worker
    raise e
  File "/Users/fernando/PycharmProjects/alf/alf/bin/train.py", line 171, in training_worker
    _train(root_dir, rank, world_size)
  File "/Users/fernando/PycharmProjects/alf/alf/bin/train.py", line 140, in _train
    trainer.train()
  File "/Users/fernando/PycharmProjects/alf/alf/trainers/policy_trainer.py", line 176, in train
    summary_max_queue=self._summary_max_queue)
  File "/Users/fernando/PycharmProjects/alf/alf/utils/common.py", line 286, in run_under_record_context
    func()
  File "/Users/fernando/PycharmProjects/alf/alf/trainers/policy_trainer.py", line 433, in _train
    train_steps = self._algorithm.train_iter()
  File "/Users/fernando/PycharmProjects/alf/alf/algorithms/rl_algorithm.py", line 619, in train_iter
    return self._train_iter_off_policy()
  File "/Users/fernando/PycharmProjects/alf/alf/algorithms/rl_algorithm.py", line 649, in _train_iter_off_policy
    experience = self.unroll(config.unroll_length)
  File "/Users/fernando/PycharmProjects/alf/alf/algorithms/rl_algorithm.py", line 506, in unroll
    return self._unroll(unroll_length)
  File "/Users/fernando/PycharmProjects/alf/alf/utils/common.py", line 977, in _func
    ret = func(*args, **kwargs)
  File "/Users/fernando/PycharmProjects/alf/alf/algorithms/rl_algorithm.py", line 558, in _unroll
    next_time_step = self._env.step(action)
  File "/Users/fernando/PycharmProjects/alf/alf/environments/alf_environment.py", line 209, in step
    self._current_time_step = self._step(action)
  File "/Users/fernando/PycharmProjects/alf/alf/environments/parallel_environment.py", line 164, in _step
    return self._stack_time_steps(time_steps)
  File "/Users/fernando/PycharmProjects/alf/alf/environments/parallel_environment.py", line 178, in _stack_time_steps
    self._time_step_with_env_info_spec, *time_steps)
  File "/Users/fernando/PycharmProjects/alf/alf/nest/nest.py", line 508, in fast_map_structure_flatten
    return pack_sequence_as(structure, [func(*x) for x in entries])
  File "/Users/fernando/PycharmProjects/alf/alf/nest/nest.py", line 79, in pack_sequence_as
    raise e
  File "/Users/fernando/PycharmProjects/alf/alf/nest/nest.py", line 74, in pack_sequence_as
    return cnest.pack_sequence_as(nest, flat_seq)
RuntimeError: Different lengths! TimeStep(step_type=TensorSpec(shape=(), dtype=torch.int32), reward=TensorSpec(shape=(), dtype=torch.float32), discount=BoundedTensorSpec(shape=(), dtype=torch.float32, minimum=array(0., dtype=float32), maximum=array(1., dtype=float32)), observation=BoundedTensorSpec(shape=(8,), dtype=torch.float32, minimum=array(-inf, dtype=float32), maximum=array(inf, dtype=float32)), prev_action=BoundedTensorSpec(shape=(), dtype=torch.int64, minimum=array(0), maximum=array(3)), env_id=TensorSpec(shape=(), dtype=torch.int32), untransformed=(), env_info={}) <-> [array([0.], dtype=float32), array([0], dtype=int32), array([ True]), array([[0.16114998, 0.00508255, 0.14029714, 0.02261066, 0.00377822,
        0.19899227, 0.        , 0.        ]], dtype=float32), array([2]), array([-11.06166], dtype=float32), array([2], dtype=int32)]

I'm running LunarLander with:

python -m alf.bin.train --conf muzero_pendulum_conf.py --root_dir ~/tmp/lunarlander/ --conf_param create_environment.env_name="'LunarLander-v2'" --conf_param SimpleMCTSModel.num_sampled_actions=None
ipsec commented 2 years ago

The error occur while self._current_time_step receive env_info different of {}

https://github.com/HorizonRobotics/alf/blob/2e12066c7988b551204e12fc413d7fe6ec75e97f/alf/environments/alf_environment.py#L209-L210

Sometimes the env_info return with {'TimeLimit.truncated': array(True)}

So, in the https://github.com/HorizonRobotics/alf/blob/2e12066c7988b551204e12fc413d7fe6ec75e97f/alf/nest/nest.py#L74 the method fail because the size is wrong:

working step

[array([1.], dtype=float32), 
array([0], dtype=int32), 
array([[-0.02575713, -0.03751579,  0.10740692,  0.23765415]], dtype=float32), 
array([1]), 
array([1.], dtype=float32), 
array([1], dtype=int32)]

not working step

[array([0.], dtype=float32), 
array([0], dtype=int32), 
array([ True]), 
array([[-0.43213508, -0.35315424,  0.00651777,  0.4939275 ]], dtype=float32), 
array([0]), 
array([1.], dtype=float32), 
array([2], dtype=int32)]

It's including array([ True]) in the flat_seq

hnyu commented 2 years ago

The error occur while self._current_time_step receive env_info different of {}

https://github.com/HorizonRobotics/alf/blob/2e12066c7988b551204e12fc413d7fe6ec75e97f/alf/environments/alf_environment.py#L209-L210

Sometimes the env_info return with {'TimeLimit.truncated': array(True)}

So, in the

https://github.com/HorizonRobotics/alf/blob/2e12066c7988b551204e12fc413d7fe6ec75e97f/alf/nest/nest.py#L74

the method fail because the size is wrong: working step

[array([1.], dtype=float32), 
array([0], dtype=int32), 
array([[-0.02575713, -0.03751579,  0.10740692,  0.23765415]], dtype=float32), 
array([1]), 
array([1.], dtype=float32), 
array([1], dtype=int32)]

not working step

[array([0.], dtype=float32), 
array([0], dtype=int32), 
array([ True]), 
array([[-0.43213508, -0.35315424,  0.00651777,  0.4939275 ]], dtype=float32), 
array([0]), 
array([1.], dtype=float32), 
array([2], dtype=int32)]

It's including array([ True]) in the flat_seq

Hi @ipsec , sorry for the late reply. It seems that you are using (for reasons unknown) the TimeLimit wrapper provided by Gym.

https://github.com/openai/gym/blob/master/gym/wrappers/time_limit.py

Whenever there is a timeout event, it will put "TimeLimit.truncated" field into the env_info. However, ALF doesn't support env_info with a variable field (between steps). Either you can write a gym wrapper to remove this field or always filling it in the env info.

There is another good reason why ALF tries to avoid using this TimeLimit wrapper explained here:

https://alf.readthedocs.io/en/latest/tutorial/environments_and_wrappers.html#step-type-and-discount

In general, please make sure to do

gym_spec = gym.spec(environment_name)
gym_env = gym_spec.make()

to avoid letting Gym wrap its TimeLimit wrapper.

You can take a look at the load() of alf/environments/suite_gym.py for an example.

ipsec commented 2 years ago

Hi @hnyu.

I don't know why CartPole is using TimeLimit wrapper. My custom env is working fine.

Thanks

hnyu commented 2 years ago

Hi @hnyu.

I don't know why CartPole is using TimeLimit wrapper. My custom env is working fine.

Thanks

Hi @ipsec , did you figure out the reason? I just ran the same training command on my side for CartPole, and it seemed working without any problem.