Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.18k stars 4.16k forks source link

OnActionReceived always receives a list of zeros #4845

Closed sash-a closed 3 years ago

sash-a commented 3 years ago

Describe the bug When setting actions through the python API the ActionBuffers.DiscreteActions in the OnActionReceived method always contains a list of zeros. I have traced what I am sending through the python API all the way down to the following line in mlagent_envs/environment.py:

outputs = self._communicator.exchange(step_input)

when printing the step_input or more generally _env_actions it is not a list of zeros, but again on the c# side that is what is received.

For some context here is the relevant lines in my python script (most importantly the step method):

class UnityGymWrapper(gym.Env):
    GymResult = Tuple[List[np.ndarray], List[np.ndarray], bool, dict]

    def __init__(self, name: Optional[str], rank: int, render=False, time_scale=50.):
        # channel = EngineConfigurationChannel()
        # channel.set_configuration_parameters(time_scale=time_scale)
        self._e: UnityEnvironment = UnityEnvironment(name, rank, no_graphics=not render)
        self._e.reset()

        self.behaviour_names = list(self._e.behavior_specs.keys())
        print(self.behaviour_names)

    def step(self, actions: List[np.ndarray]) -> GymResult:
        for behaviour_name, action in zip(self.behaviour_names, actions):
            self._e.set_actions(behaviour_name, action)
            # print(f'{behaviour_name}:{self._e._env_actions[behaviour_name]}')

        self._e.step()

        return self.collect_obs()

    def reset(self) -> GymResult:
        self._e.reset()
        return self.collect_obs()

    def collect_obs(self) -> GymResult:
        """
        :returns a list of observations. Each list item belongs to a different team. Within each item there may be is an
        ndarry of observations, where each dimension is the observation of a team member.
        """
        obs = []
        rews = []
        done = False

        for name in self.behaviour_names:
            step, term_step = self._e.get_steps(name)
            if term_step:
                done = True
                obs += term_step.obs
                rews += term_step.reward
            else:
                done = False
                obs += step.obs
                rews += [step.reward]

        return obs, rews, done, {'step': term_step if done else step}

    def render(self, mode='human'):
        raise Warning('Render cannot be called for unity env, it must be set in the constructor')

if __name__ == '__main__':
    print('starting...')
    e = UnityGymWrapper(None, 0, render=True, time_scale=1.)
    done = False
    while not done:
        strikers = np.random.randint(-1, 2, (2, 3), dtype=np.int)  # random numbers between -1 and 1
        goalie = np.random.randint(-1, 2, (1, 3), dtype=np.int)

        ob, _, done, _ = e.step([strikers, goalie])
    print('done')

Here is the debug I am doing in unity:

public override void OnActionReceived(ActionBuffers actionBuffers)
{
    var forwardAxis = actionBuffers.DiscreteActions[0];
    var rightAxis = actionBuffers.DiscreteActions[1];
    var rotateAxis = actionBuffers.DiscreteActions[2];

    Debug.Log($"{forwardAxis}, {rightAxis}, {rotateAxis}");
    ...
}

This is the debug that always print zeros, whereas on the python side all the way up to the line before it is sent, it does not print only zeros. Just to note when using inference I do see non zero values on the c# side.

Here is the relevant info in the editor image

Environment

sash-a commented 3 years ago

Seems like it was because of the version of the python API, I needed v0.23.0 to go with release 12 (for anyone else that has this issue)

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.