Getting an error by running train.py: ValueError: ('Observation ({}) outside given space ({})!', 9, Box(-1.0, 1.0, (11,), float32))

meysammoh commented 3 years ago

I'm getting an error by running train.py.

Machine: Macbook pro 2017 - Big Sur - version 11.0.1 (20B29) Environment: Anaconda conda 4.9.2

Python 3.8.5 Ray 1.0.1.post1 Tensorflow 2.3.1

Full console log:

WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-12-08 22:31:33,163 INFO services.py:1090 -- View the Ray dashboard at http://127.0.0.1:8265
2020-12-08 22:31:36,024 INFO logger.py:200 -- pip install 'ray[tune]' to see TensorBoard files.
2020-12-08 22:31:36,024 WARNING logger.py:342 -- Could not instantiate TBXLogger: No module named 'tensorboardX'.
2020-12-08 22:31:36,026 INFO trainer.py:592 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
2020-12-08 22:31:36,026 INFO trainer.py:1064 -- `_use_trajectory_view_api` only supported for PyTorch so far! Will run w/o.
2020-12-08 22:31:36,026 INFO trainer.py:617 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=20565) WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=20565) Instructions for updating:
(pid=20565) non-resource variables are not supported in the long term
(pid=20564) WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=20564) Instructions for updating:
(pid=20564) non-resource variables are not supported in the long term
2020-12-08 22:31:48,167 INFO trainable.py:252 -- Trainable.setup took 12.142 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2020-12-08 22:31:48,167 WARNING util.py:40 -- Install gputil for GPU system monitoring.
WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/policy/tf_policy.py:875: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
(pid=20565) WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/policy/tf_policy.py:875: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
(pid=20565) Instructions for updating:
(pid=20565) Prefer Variable.assign which has equivalent behavior in 2.X.
(pid=20564) WARNING:tensorflow:From /opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/policy/tf_policy.py:875: Variable.load (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
(pid=20564) Instructions for updating:
(pid=20564) Prefer Variable.assign which has equivalent behavior in 2.X.
 1 reward -21.00/ -6.90/ 10.00 len 7.94 saved tmp/exa/checkpoint_1/checkpoint-1
 2 reward -20.00/  0.87/ 10.00 len 5.64 saved tmp/exa/checkpoint_2/checkpoint-2
 3 reward -19.00/  5.68/ 10.00 len 3.96 saved tmp/exa/checkpoint_3/checkpoint-3
 4 reward -18.00/  7.16/ 10.00 len 3.32 saved tmp/exa/checkpoint_4/checkpoint-4
 5 reward -16.00/  7.66/ 10.00 len 3.02 saved tmp/exa/checkpoint_5/checkpoint-5
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
observations (InputLayer)       [(None, 11)]         0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 256)          3072        observations[0][0]               
__________________________________________________________________________________________________
fc_value_1 (Dense)              (None, 256)          3072        observations[0][0]               
__________________________________________________________________________________________________
fc_2 (Dense)                    (None, 256)          65792       fc_1[0][0]                       
__________________________________________________________________________________________________
fc_value_2 (Dense)              (None, 256)          65792       fc_value_1[0][0]                 
__________________________________________________________________________________________________
fc_out (Dense)                  (None, 2)            514         fc_2[0][0]                       
__________________________________________________________________________________________________
value_out (Dense)               (None, 1)            257         fc_value_2[0][0]                 
==================================================================================================
Total params: 138,499
Trainable params: 138,499
Non-trainable params: 0
__________________________________________________________________________________________________
None
2020-12-08 22:32:08,653 INFO trainable.py:481 -- Restored on 192.168.0.3 from checkpoint: tmp/exa/checkpoint_5/checkpoint-5
2020-12-08 22:32:08,653 INFO trainable.py:489 -- Current state after restoring: {'_iteration': 5, '_timesteps_total': None, '_time_total': 20.26572561264038, '_episodes_total': 4752}

Traceback (most recent call last):
  File "train.py", line 83, in <module>
    main()
  File "train.py", line 69, in main
    action = agent.compute_action(state)
  File "/opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 819, in compute_action
    preprocessed = self.workers.local_worker().preprocessors[
  File "/opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/models/preprocessors.py", line 166, in transform
    self.check_shape(observation)
  File "/opt/anaconda3/envs/rl/lib/python3.8/site-packages/ray/rllib/models/preprocessors.py", line 62, in check_shape
    raise ValueError(
ValueError: ('Observation ({}) outside given space ({})!', 9, Box(-1.0, 1.0, (11,), float32))

ybz21 commented 3 years ago

I got this problem too

ceteri commented 3 years ago

Will try to recreate that here

francoisbeaussier commented 2 years ago

It looks like this was fixed and Discrete is being used instead of Box: https://github.com/DerwenAI/gym_example/commit/5c5208c892c02bafcf63c642288a9330ff4dbec4#diff-66b38a640fb064f141690cea56a18b51f82d860487263965dec468e542b10c36

DerwenAI / gym_example

Getting an error by running train.py: ValueError: ('Observation ({}) outside given space ({})!', 9, Box(-1.0, 1.0, (11,), float32)) #1