Training Tennis Example - Multiple Brains?

HappySlice commented 6 years ago

My impression was that the brain would play against itself, but it appears to have two brains.

Python throws an exception expecting an array for an input in the environment.py script. What is the proper way to go about this? Am I approaching this correctly?

Thanks.

awjuliani commented 6 years ago

HI @HappySlice,

There are two brain in order to allow for using one as the player, and the other as the trained network.

Under what exact conditions are you getting the error in python? Is this when trying to run ppo.py? Would it be possible to paste the error here?

HappySlice commented 6 years ago

UnityActionException                      Traceback (most recent call last)
<ipython-input-4-583a438fd989> in <module>()
     56             info = env.reset(train_mode=train_model, progress=get_progress())[brain_name]
     57         # Decide and take an action
---> 58         new_info = trainer.take_action(info, env, brain_name, steps, normalize)
     59         info = new_info
     60         trainer.process_experiences(info, time_horizon, gamma, lambd)

C:\Users\david\Documents\ml-agents-master\python\ppo\trainer.py in take_action(self, info, env, brain_name, steps, normalize)
     73         self.stats['entropy'].append(ent)
     74         self.stats['learning_rate'].append(learn_rate)
---> 75         new_info = env.step(actions, value={brain_name: value})[brain_name]
     76         self.add_experiences(info, new_info, epsi, actions, a_dist, value)
     77         return new_info

C:\Users\david\Documents\ml-agents-master\python\unityagents\environment.py in step(self, action, memory, value)
    366                     raise UnityActionException(
    367                         "You have {0} brains, you need to feed a dictionary of brain names a keys, "
--> 368                         "and actions as values".format(self._num_brains))
    369                 else:
    370                     raise UnityActionException(

UnityActionException: You have 2 brains, you need to feed a dictionary of brain names a keys, and actions as values

HappySlice commented 6 years ago

The above is an exception when using the Jupyter notebook and the same occurs when running ppo.py

HappySlice commented 6 years ago

So ultimately do I modify the method take_action in the trainer.py script to use a dictionary? I'm not sure how to do that with python syntax and with the list next to it in on line 75, it's a bit foreign to me. The function from trainer is below

    def take_action(self, info, env, brain_name, steps, normalize):
        """
        Decides actions given state/observation information, and takes them in environment.
        :param info: Current BrainInfo from environment.
        :param env: Environment to take actions in.
        :param brain_name: Name of brain we are learning model for.
        :return: BrainInfo corresponding to new environment state.
        """
        epsi = None
        feed_dict = {self.model.batch_size: len(info.states)}
        run_list = [self.model.output, self.model.probs, self.model.value, self.model.entropy,
                    self.model.learning_rate]
        if self.is_continuous:
            epsi = np.random.randn(len(info.states), env.brains[brain_name].action_space_size)
            feed_dict[self.model.epsilon] = epsi
        if self.use_observations:
            feed_dict[self.model.observation_in] = np.vstack(info.observations)
        if self.use_states:
            feed_dict[self.model.state_in] = info.states
        if self.is_training and env.brains[brain_name].state_space_type == "continuous" and self.use_states and normalize:
            new_mean, new_variance = self.running_average(info.states, steps, self.model.running_mean,
                                                          self.model.running_variance)
            feed_dict[self.model.new_mean] = new_mean
            feed_dict[self.model.new_variance] = new_variance
            run_list = run_list + [self.model.update_mean, self.model.update_variance]
            actions, a_dist, value, ent, learn_rate, _, _ = self.sess.run(run_list, feed_dict=feed_dict)
        else:
            actions, a_dist, value, ent, learn_rate = self.sess.run(run_list, feed_dict=feed_dict)
        self.stats['value_estimate'].append(value)
        self.stats['entropy'].append(ent)
        self.stats['learning_rate'].append(learn_rate)
        new_info = env.step(actions, value={brain_name: value})[brain_name]
        self.add_experiences(info, new_info, epsi, actions, a_dist, value)
        return new_info

awjuliani commented 6 years ago

Hi @HappySlice,

To train Tennis, simply disable one of the two brains, set the other to "external", and set both agents within each tennis area to the activated external brain.

HappySlice commented 6 years ago

Thanks! So, if I did want to train two brains for a custom model, how would I go about modifying the scripts to pass in the dictionary?

tylermakes commented 6 years ago

I'm curious about this too (training two separate brains). Also, are there instructions anywhere for the steps to take in setting this up? There seem to be key requirements (setting each area to use one external brain, and another issue mentions it takes millions of steps to make progress). I ask because I'm trying to train an AI opponent in a multiplayer game.

Sposito commented 6 years ago

I am in the same situation here, trying to use two brains in a custom project without success.

awjuliani commented 6 years ago

Hi all,

As of ML-Agents v0.3 we now support training multiple external brains from a single training session. For more information, I would recommend checking out: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-ML-Agents.md

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Training Tennis Example - Multiple Brains? #184