DylanCope / Multi-Agent-RL-with-TF

Training intrinsically motivated, independent Q-learners to play Tic-Tac-Toe
https://dylancope.github.io/Multiagent-RL-with-TFAgents/
11 stars 5 forks source link

Cannot create IMAgent #2

Open FranzPfeifroth opened 3 years ago

FranzPfeifroth commented 3 years ago

I copied this notebook including the tic_tac_toe_env.py. I try to get the notebook running on Colab. I'm able to run all cells upto "Creating player objects".

Here I get the error:

ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name=None, minimum=array(0, dtype=int32), maximum=array(8, dtype=int32))
  In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)

This is caused by the definition of the action spec:

def action_spec(self):
        position_spec = BoundedArraySpec((1,), np.int32, minimum=0, maximum=8)
        value_spec = BoundedArraySpec((1,), np.int32, minimum=1, maximum=2)
        return {
            'position': position_spec,
            'value': value_spec
        }

The 'position' which is the action for the IMAgent is an array not a scalar. Please give me an advice how ´to solve this problem.

DylanCope commented 3 years ago

As far as I know, there is not a "scalar spec", but rather scalars are just represented by arrays with one element.

Is the error occurring when you hit the line when the position_spec is defined or afterwards? Perhaps the issue is that nested action specs are no longer supported.

In which case, you would need to have a scalar that can take 8*2 possible values, where the first 8 encode "play in position n with value 1" and the latter 8 encode the same for playing with value 2.

FranzPfeifroth commented 3 years ago

I added the complete stack trace below.

I call: player_1 = IMAgent( tf_ttt_env, action_spec = tf_ttt_env.action_spec()['position'], action_fn = partial(ttt_action_fn, 1), name='Player1' ) and this is calling the super class (DgnAgent) (in cell: defining the IMAgent class) super().init(time_step_spec, self._action_spec, q_network, optimizer, name=name, **dqn_kwargs)

Stacktrace

ValueError Traceback (most recent call last)

in () 10 action_spec = tf_ttt_env.action_spec()['position'], 11 action_fn = partial(ttt_action_fn, 1), ---> 12 name='Player1' 13 ) 14 6 frames in __init__(self, env, observation_spec, action_spec, reward_fn, action_fn, name, q_network, replay_buffer_max_length, learning_rate, training_batch_size, training_parallel_calls, training_prefetch_buffer_size, training_num_steps, **dqn_kwargs) 49 optimizer, 50 name=name, ---> 51 **dqn_kwargs) 52 print("After super().__init__") 53 /usr/local/lib/python3.6/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs) 1076 scope_info = " in scope '{}'".format(scope_str) if scope_str else '' 1077 err_str = err_str.format(name, fn_or_cls, scope_info) -> 1078 utils.augment_exception_message_and_reraise(e, err_str) 1079 1080 return gin_wrapper /usr/local/lib/python3.6/dist-packages/gin/utils.py in augment_exception_message_and_reraise(exception, message) 47 if six.PY3: 48 ExceptionProxy.__qualname__ = type(exception).__qualname__ ---> 49 six.raise_from(proxy.with_traceback(exception.__traceback__), None) 50 else: 51 six.reraise(proxy, None, sys.exc_info()[2]) /usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value) /usr/local/lib/python3.6/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs) 1053 1054 try: -> 1055 return fn(*new_args, **new_kwargs) 1056 except Exception as e: # pylint: disable=broad-except 1057 err_str = '' /usr/local/lib/python3.6/dist-packages/tf_agents/agents/dqn/dqn_agent.py in __init__(self, time_step_spec, action_spec, q_network, optimizer, observation_and_action_constraint_splitter, epsilon_greedy, n_step_update, boltzmann_temperature, emit_log_probability, target_q_network, target_update_tau, target_update_period, td_errors_loss_fn, gamma, reward_scale_factor, gradient_clipping, debug_summaries, summarize_grads_and_vars, train_step_counter, name) 215 tf.Module.__init__(self, name=name) 216 --> 217 self._check_action_spec(action_spec) 218 219 if epsilon_greedy is not None and boltzmann_temperature is not None: /usr/local/lib/python3.6/dist-packages/tf_agents/agents/dqn/dqn_agent.py in _check_action_spec(self, action_spec) 295 raise ValueError( 296 'Only scalar actions are supported now, but action spec is: {}' --> 297 .format(action_spec)) 298 299 spec = flat_action_spec[0] ValueError: Only scalar actions are supported now, but action spec is: BoundedTensorSpec(shape=(1,), dtype=tf.int32, name=None, minimum=array(0, dtype=int32), maximum=array(8, dtype=int32)) In call to configurable 'DqnAgent' ()