Closed sparisi closed 2 years ago
The reason why 'success' is always 0 and the environment doesn't return 'success_reward' at goal is that this happens only if the action 'STOP' is passed. This action is not returned by get_action_shortest_path
.
For my purpose, I do not use 'STOP' actions, and I'd like that the episode ends with 'success' when the agent is close to the goal.
Is there a flag that I can pass (maybe in the YAML?) to achieve that? Or a way to override the Success
, or to create and pass a custom terminal condition? Usually in RL the episode automatically ends when the agent reaches the goal.
No, we currently don't support not having a stop action given its importance in EmbodiedAI tasks (a task like ObjectNav would be much less meaningful without a stop action for instance).
You could, however, have an oracle that calls STOP when a condition is reached though. This usually would require hard coding this into the agent with privileged information.
You could, however, have an oracle that calls STOP when a condition is reached though. This usually would require hard coding this into the agent with privileged information.
I think it would be easier to check for terminal condition without 'STOP' and then call reset. Anyway yes, I would have to hard code SPL success condition into my wrapper. For now I simply have commented the check for 'STOP' in the source code and it works.
No, we currently don't support not having a stop action given its importance in EmbodiedAI tasks (a task like ObjectNav would be much less meaningful without a stop action for instance).
It would still be useful to simply have a flag to remove the need for is_stop_called
in Success
(something like is_stop_required
). Just giving a suggestion :)
@sparisi Feel free to re-open the issue, if you still have questions.
Habitat-Lab and Habitat-Sim versions
Habitat-Lab: master Habitat-Sim: master
I have a navigation task, and in the YAML I have
SUCCESS
asMEASUREMENTS
.SUCCESS_REWARD
is 2.5 andSUCCESS_DISTANCE
is 0.2. I also set the goal radius to 0.2 in the JSON file. I runget_action_shortest_path
withsuccess_distance=env.unwrapped._core_env_config.TASK.SUCCESS_DISTANCE
(0.2) to find the optimal trajectory (the function finds a 51-step trajectory), and then I run the actions to see what the optimal rewards will be. Image rendering looks fine, and the distance between the agent and the goal decreases. However, at the endinfo['success']
is still 0, the reward small (the highest it gets is mid-trajectory and it is 0.2x), and the agent is still far from the goal (0.36, that is larger than 0.2). Also, thedone
flag is False (I expect that when the agent reaches the goal it returns True). Since the path is found withget_action_shortest_path
, shouldn't the agent get closer to the goal (<0.2), get higher reward (close to 2.5), andinfo['success']
be 1 at the last step of the path?The scene is apartment_0 from Replica.
Example of a trajectory:
EDIT
The 2nd component of the position seems wrong, so I changed it to what I get from
sample_navigable_point
, which is -1.3747652. Now the agent goes closer than 0.2 to the goal, but success is still 0 anddone
is False.