PPO agent produces non-deterministic results when evaluating the same episode

ericchen321 commented 3 years ago

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master (commit ce397) Habitat-Sim: master (commit 5cb10)

Docs and Tutorials

Did you read the docs? Yes Did you check out the tutorials? Yes

❓ Questions and Help

Problem

Hello, I am trying to evaluate the V2 RGBD agent in one of the Habitat test point-navigation episodes in a similar fashion as ppo_agents.py, but I noticed that the agent produces inconsistent actions when evaluating the same episode, depending on whether

the episode was evaluated alone, or
evaluated after episodes before it from the same dataset.

I have fixed random seed when initializing the agent, so based on what I understand from this post, the agent should be deterministic. So may I ask why it would produce different actions depending on the order at which I evaluated the episode?

Context

I created my evaluation script based upon ppo_agents.py, and made some changes so it can either evaluate all episodes from a dataset, or select a particular episode to start the evaluation from.

In pseudocode my evaluation process is

# iterate until we find the first episode to evaluate
if not evaluate_all:
  while env.episode_id != ep_id or env.scene_id != sc_id:
    env.reset()

while count_episodes < num_episodes:
  agent.reset()
  observations = env.reset()
  while not env.episode_over:
    action = agent.act(observations)
    observations = env.step(action)

I have called agent.reset() so the agent's decision would not be affected by previous episodes. Also fixed RANDOM_SEED to 7 as in ppo_agents.py.

The episode of interest is from the test scenes data, has episode-id=49 and scene-id=data/datasets/van-gogh-room.glb. I have also confirmed that the envionrment initally always produced deterministic readings, but the agent's action after a couple of steps into the episode became non-deterministic.

erikwijmans commented 3 years ago

This is because the state of the PRNGs for action noise and sensor noise is different. Those get seeded at simulator/environment creation but not on reset so their state will be a function of the pervious episodes. If you want to fix the random seed based on the episode, doing something like env.seed(hash(env.current_episode.scene_id) + hash(env.current_episode.episode_id)) should work.

ericchen321 commented 3 years ago

Hi Erik, thanks for the quick response! That makes sense.

ericchen321 commented 3 years ago

Sorry for asking help on this again - I have fixed the environment's random seed to 0, but still I'm having the discrepency.

I suspect this being the agent's problem rather than the environment's, because I have compared sensor readings and noticed that as long as the agent was producing the same action, the readings I got from env.step() were always identical. The divergence happened from the 6th step where given the same readings, the RGBD agent produced different actions.

erikwijmans commented 3 years ago

Yeah, likely an agent issue. Worth reading through PyTorch's docs on determinism: https://pytorch.org/docs/stable/notes/randomness.html

ericchen321 commented 3 years ago

Hi Erik, again thanks for looking into this issue. I will look into the doc and see if I can find the source of non-determinism.

facebookresearch / habitat-lab