Open ericchen321 opened 3 years ago
This is because the state of the PRNGs for action noise and sensor noise is different. Those get seeded at simulator/environment creation but not on reset so their state will be a function of the pervious episodes. If you want to fix the random seed based on the episode, doing something like env.seed(hash(env.current_episode.scene_id) + hash(env.current_episode.episode_id))
should work.
Hi Erik, thanks for the quick response! That makes sense.
Sorry for asking help on this again - I have fixed the environment's random seed to 0
, but still I'm having the discrepency.
I suspect this being the agent's problem rather than the environment's, because I have compared sensor readings and noticed that as long as the agent was producing the same action, the readings I got from env.step()
were always identical. The divergence happened from the 6th step where given the same readings, the RGBD agent produced different actions.
Yeah, likely an agent issue. Worth reading through PyTorch's docs on determinism: https://pytorch.org/docs/stable/notes/randomness.html
Hi Erik, again thanks for looking into this issue. I will look into the doc and see if I can find the source of non-determinism.
Habitat-Lab and Habitat-Sim versions
Habitat-Lab: master (commit ce397) Habitat-Sim: master (commit 5cb10)
Docs and Tutorials
Did you read the docs? Yes Did you check out the tutorials? Yes
❓ Questions and Help
Problem
Hello, I am trying to evaluate the V2 RGBD agent in one of the Habitat test point-navigation episodes in a similar fashion as
ppo_agents.py
, but I noticed that the agent produces inconsistent actions when evaluating the same episode, depending on whetherI have fixed random seed when initializing the agent, so based on what I understand from this post, the agent should be deterministic. So may I ask why it would produce different actions depending on the order at which I evaluated the episode?
Context
I created my evaluation script based upon
ppo_agents.py
, and made some changes so it can either evaluate all episodes from a dataset, or select a particular episode to start the evaluation from.In pseudocode my evaluation process is
I have called
agent.reset()
so the agent's decision would not be affected by previous episodes. Also fixedRANDOM_SEED
to7
as inppo_agents.py
.The episode of interest is from the test scenes data, has
episode-id=49
andscene-id=data/datasets/van-gogh-room.glb
. I have also confirmed that the envionrment initally always produced deterministic readings, but the agent's action after a couple of steps into the episode became non-deterministic.