step() has been changed to return reward (gained from that step) rather than absolute score.
Notes:
The previous step() API was made to mirror TextWorld. This makes the API more closely mirror OpenAI Gym and Jericho.
Absolute score (and reward) are still both available in the 'info' dictionary returned from step().
This is likely a breaking change for anything currently using ScienceWorld, so we should perhaps bump the version higher than just rc3?
Example of running human.py example:
Gold Path:['open door to kitchen', 'go to kitchen', 'look around', 'focus on counter', 'move counter to red box']
Task Name: task-3-find-non-living-thing
Variation: 0 / 300
Task Description: Your task is to find a(n) non-living thing. First, focus on the thing. Then, move it to the red box in the kitchen.
This room is called the hallway. In it, you see:
the agent
a picture
a substance called air
You also see:
A door to the workshop (that is closed)
A door to the art studio (that is closed)
A door to the kitchen (that is closed)
A door to the living room (that is closed)
A door to the green house (that is closed)
A door to the bedroom (that is closed)
Reward: 0
Score: 0
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
take picture
You move the picture to the inventory.
Reward: 0
Score: 0
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
focus on picture
You focus on the picture.
Reward: 58
Score: 58
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
open kitchen door
The door is now open.
Reward: 9
Score: 67
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
go kitchen
You move to the kitchen.
Reward: 16
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
look
This room is called the kitchen. In it, you see:
a stopwatch, which is deactivated.
a freezer. The freezer door is closed.
a table. On the table is: a glass cup (containing nothing).
a substance called air
a cupboard. The cupboard door is closed.
a sink, which is turned off. In the sink is: nothing.
the agent
a painting
a fridge. The fridge door is closed.
a chair. On the chair is: nothing.
a glass jar (containing a substance called sodium chloride)
a substance called soap
a stove, which is turned off. On the stove is: nothing.
a red box (containing nothing)
a oven, which is turned off. The oven door is closed.
a thermometer, currently reading a temperature of 10 degrees celsius
a lighter
a counter. On the counter is: a drawer, a bowl (containing an orange, a red apple, a banana, a potato).
You also see:
A door to the bathroom (that is closed)
A door to the outside (that is closed)
A door to the hallway (that is open)
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
move picture in inventory to red box
You move the picture to the red box.
Reward: 17
Score: 100
isCompleted: True
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!).
'goals' lists progress on subgoals.
type 'exit' to quit.
step()
has been changed to return reward (gained from that step) rather than absolute score.Notes:
step()
API was made to mirror TextWorld. This makes the API more closely mirror OpenAI Gym and Jericho.step()
.Example of running
human.py
example: