OnActionReceived always receives a list of zeros

sash-a commented 3 years ago

Describe the bug When setting actions through the python API the ActionBuffers.DiscreteActions in the OnActionReceived method always contains a list of zeros. I have traced what I am sending through the python API all the way down to the following line in mlagent_envs/environment.py:

outputs = self._communicator.exchange(step_input)

when printing the step_input or more generally _env_actions it is not a list of zeros, but again on the c# side that is what is received.

For some context here is the relevant lines in my python script (most importantly the step method):

class UnityGymWrapper(gym.Env):
    GymResult = Tuple[List[np.ndarray], List[np.ndarray], bool, dict]

    def __init__(self, name: Optional[str], rank: int, render=False, time_scale=50.):
        # channel = EngineConfigurationChannel()
        # channel.set_configuration_parameters(time_scale=time_scale)
        self._e: UnityEnvironment = UnityEnvironment(name, rank, no_graphics=not render)
        self._e.reset()

        self.behaviour_names = list(self._e.behavior_specs.keys())
        print(self.behaviour_names)

    def step(self, actions: List[np.ndarray]) -> GymResult:
        for behaviour_name, action in zip(self.behaviour_names, actions):
            self._e.set_actions(behaviour_name, action)
            # print(f'{behaviour_name}:{self._e._env_actions[behaviour_name]}')

        self._e.step()

        return self.collect_obs()

    def reset(self) -> GymResult:
        self._e.reset()
        return self.collect_obs()

    def collect_obs(self) -> GymResult:
        """
        :returns a list of observations. Each list item belongs to a different team. Within each item there may be is an
        ndarry of observations, where each dimension is the observation of a team member.
        """
        obs = []
        rews = []
        done = False

        for name in self.behaviour_names:
            step, term_step = self._e.get_steps(name)
            if term_step:
                done = True
                obs += term_step.obs
                rews += term_step.reward
            else:
                done = False
                obs += step.obs
                rews += [step.reward]

        return obs, rews, done, {'step': term_step if done else step}

    def render(self, mode='human'):
        raise Warning('Render cannot be called for unity env, it must be set in the constructor')

if __name__ == '__main__':
    print('starting...')
    e = UnityGymWrapper(None, 0, render=True, time_scale=1.)
    done = False
    while not done:
        strikers = np.random.randint(-1, 2, (2, 3), dtype=np.int)  # random numbers between -1 and 1
        goalie = np.random.randint(-1, 2, (1, 3), dtype=np.int)

        ob, _, done, _ = e.step([strikers, goalie])
    print('done')

Here is the debug I am doing in unity:

public override void OnActionReceived(ActionBuffers actionBuffers)
{
    var forwardAxis = actionBuffers.DiscreteActions[0];
    var rightAxis = actionBuffers.DiscreteActions[1];
    var rotateAxis = actionBuffers.DiscreteActions[2];

    Debug.Log($"{forwardAxis}, {rightAxis}, {rotateAxis}");
    ...
}

This is the debug that always print zeros, whereas on the python side all the way up to the line before it is sent, it does not print only zeros. Just to note when using inference I do see non zero values on the c# side.

Here is the relevant info in the editor

Environment

Unity Version: 2019.4.10f1
OS + version: ubuntu 20.10

Python ML-Agents version:

conda list | grep mlagents
mlagents                  0.20.0                   pypi_0    pypi
mlagents-envs             0.20.0                   pypi_0    pypi

c# ML-Agents version: release_12 from the github release page
Environment: A modified version of the striker vs goalie env, removing the raycast observations in favour of sending ball/agent positions directly. Also only a single field instead of multiple. (If required I can push this to a repo)

sash-a commented 3 years ago

Seems like it was because of the version of the python API, I needed v0.23.0 to go with release 12 (for anyone else that has this issue)

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

OnActionReceived always receives a list of zeros #4845