fixes CoolObjectAction reward function

anisha2102 commented 3 years ago

What?

Corrected the Cooling reward function to implement the expected functionality for RL training.

Why?

The previous implementation prematurely marked the CoolObjectAction subgoal as completed immediately after the object was cooled (i.e. placed in the fridge and fridge door was closed). This created a poor/no incentive for the RL agent to pick up the cooled object before moving to the next subgoal. As a result, the previous function for RL training led to no success rate for completing the CoolObject subgoal.

How?

The CoolObjectAction reward function is modified as follows:

It marks the CoolObject subgoal completed only after the agent completes all of the following sequence of actions : Cool Object -> Reopen Fridge -> Pickup cooled object
It provides intermediate rewards for completing subparts of the cooling subgoal for denser reward feedback.

Misc.

An agent trained with PPO with the revised CoolObjectAction function is able to successfully complete the task of _pick_cool_then_place_inrecep.

MohitShridhar commented 3 years ago

@anisha2102 thanks for this!

That right, reward.py doesn't provide dense rewards for each subgoal. This feature could be potentially useful for anyone doing RL with dense rewards. But to be consistent, this should be supported for all Subgoals including GotoLocation, HeatObject etc.

As of now, I'll keep this pull request open to anyone that might be interested.

anisha2102 commented 3 years ago

@MohitShridhar

I agree with your point about the lack of consistency due to the introduction of dense rewards. However, just to clarify, I would like to highlight the main issue that is handled here: the existing reward function is insufficient for driving the agent to complete tasks such as pick_cool_then_place_in_recep

To illustrate this, the following is the high_pddl_plan for the task pick_cool_then_place_in_recep for Pot & DiningTable :

GotoLocation (go to the Counter)
PickupObject (pick up the Pot)
GotoLocation (go to the Fridge)
CoolObject (open Fridge -> place Pot -> close fridge -> reopen fridge -> pick up Pot)
GotoLocation (go to DiningTable)
PutObject (place Pot)

Let's focus on the plan starting from Subgoal 4. With the current reward function, the agent gets the cooling reward as soon as it executes open Fridge -> place Pot -> close fridge. It then directly moves on to Subgoal 5 go to DiningTable and get's a reward for it even if it does not pick up the Pot back from the fridge ( reopen fridge -> pick up Pot). Essentially. the agent can never complete Subgoal 6 as there is NO incentive to pick up the cooled object as Subgoal 4 was prematurely maked as completed.

There are 2 ways to handle this IMO:

Modify the high_pddl_plan to split subgoal 4 CoolObject in two parts: CoolObject (open Fridge -> place Pot -> close fridge ) & PickupObject (reopen fridge -> pick up Pot)
Modify the reward function to mark the cooling subgoal as completed only after the agent has cooled the object AND also picked it up from the fridge. (I have implemented this, but I also added dense supervision - [which can be removed due to lack of consistency, as you pointed out])

Also, note that the same issue can occur for the HeatObject subgoal where there is no incentive to pick up the heated object without moving on to the next subgoal of navigating to the target receptacle location.

Please let me know if this makes sense to you.

MohitShridhar commented 3 years ago

@anisha2102 I see what you mean. Sorry I missed the main issue in the original post. I'll merge this for now, but will add a note that other subgoals don't support dense rewards. If you end up fixing the other ones before me, please consider submitting a pull request. Much appreciated, thanks!

SamNPowers commented 3 years ago

Hi! Thanks for fixing this bug, but I'm trying to use the fix and noticed something: subgoal isn't actually defined in CoolObjectAction, which means the code path that should fire with elif is_obj_cool and state.metadata['lastAction']=='OpenObject': seems never to be getting called, otherwise I would expect to see an exception thrown?

Edit: to be clear the line I'm referencing with an undefined "subgoal" is here: https://github.com/anisha2102/alfred/blob/c9506385bfa29cfcd4c943b76f38ad662659c5bb/env/reward.py#L263

Thanks!

anisha2102 commented 3 years ago

Thanks for pointing this out. You can add subgoal = expert_plan[goal_idx]['planner_action'] in the CoolObjectAction function to quickly resolve it and I’ll add the fix soon.

Regarding your concern about this part of the code never being executed - Please note that this portion of code will only be executed when your agent has successfully learnt to do execute first half of the cooling sub task in a task like pick_cool_then_place_in_recep. Specifically, this would mean that an agent which is already successfully executed the high level actions of open fridge -> place object -> close fridge and is now trying to explore the action of (re)Opening the fridge to pick up the cooled object.

Hope this helps.

askforalfred / alfred