allenai / ScienceWorld

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
https://sciworld.apps.allenai.org/
Apache License 2.0
199 stars 24 forks source link

Focusing on object sometimes removes it from list of interactable objects when building admissible commands. #48

Open MarcCote opened 1 year ago

MarcCote commented 1 year ago

Todo python examples/human.py --task-num 6 --var-num 0

1- look around 2- open door to greenhouse, score(8) 3- go to greenhouse , score(9) 4- look around 5- focus on pea plant in flower pot 3, score(50) 6- pick up flower pot 3, score(8) 7- go to hallway 8- open door to kitchen 9- go to kitchen, score(8) 10- focus on red box (losing focus) 11- move flower pot 3 to red box

12 - focus on pea plant (not listed in admissible commands!)

PeterAJansen commented 1 year ago

Hmmm interesting, I was unable to replicate this one. I wonder what the issue might be if it's not immediately repeatable: (I had initially thought that the pea plant might have died from a lack of water on its journey to the kitchen, changing its name, but I think the flower pots in that task are "infinite watering" to prevent exactly that)

python examples/human.py --task-num 6 --var-num 0

...
> look 

This room is called the kitchen. In it, you see: 
        the agent
        a substance called air
        a chair. On the chair is: nothing.
        a counter. On the counter is: a bowl (containing a red apple, a banana, an orange, a potato), a drawer.
        a cupboard. The cupboard door is closed. 
        a freezer. The freezer door is closed. 
        a fridge. The fridge door is closed. 
        a glass jar (containing a substance called sodium chloride)
        a lighter
        a oven, which is turned off. The oven door is closed. 
        a painting
        a red box (containing nothing)
        a sink, which is turned off. In the sink is: nothing.
        a substance called soap
        a stopwatch, which is deactivated. 
        a stove, which is turned off. On the stove is: nothing.
        a table. On the table is: a glass cup (containing nothing).
        a thermometer, currently reading a temperature of 10 degrees celsius
You also see:
        A door to the bathroom (that is closed)
        A door to the hallway (that is open)
        A door to the outside (that is closed)
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> inventory

In your inventory, you see:
        a flower pot 5 (containing a pea plant in the reproducing stage with a tall height, soil)
        an orange
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> focus on red box

You focus on the red box.
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> move flower pot to red box

You move the flower pot 5 to the red box.
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> look in red box

Inside the red box is: 
        a flower pot 5 (containing a pea plant in the reproducing stage with a tall height, soil)
Reward: 0
Score: 83
isCompleted: False
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> focus on pea plant

You focus on the pea plant.
Reward: 17
Score: 100
isCompleted: True
'help' lists valid action templates, 'objects' lists valid objects, 'valid' lists valid action-object combinations (long!). 
'goals' lists progress on subgoals.
type 'exit' to quit.
> 
MarcCote commented 1 year ago

Is focus on pea plant actually listed in the admissible commands? E.g., if you tab tab it?

PeterAJansen commented 1 year ago

Ah I see -- my mistake, the action sequence works but "focus on pea plant" is indeed not in the valid actions list under that referent. Since objects can have multiple referents, and enumerating all of them would make the valid actions list huge (especially when they include containers, e.g. X, X in Y, X in Y on Z), currently the input parser chooses a unique referent for each object in the valid actions list. But it's not doing a good job here, since it's picking "living thing" as the referent for the pea plant:

{ "action":"focus on living thing", "template_id":11, "obj_ids":[762], "type_ids":[67] }

Possible fixes are:

PeterAJansen commented 1 year ago

I've implemented the non-breaking Option #1 in this branch: https://github.com/allenai/scienceworld/tree/exhaustivevalidactions

The list of valid actions now returns a separate template for each valid string, and for actions that have multiple arguments it iterates all the possible combinations for each object, for each referent -- so the list can be much, much longer now.

Here's an example:

{"validActions": [{ "action":"close door to greenhouse", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close door", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close greenhouse door", "template_id":1, "obj_ids":[17577], "type_ids":[152] },{ "action":"close door to kitchen", "template_id":1, "obj_ids":[17562], "type_ids":[152] },{ "action":"close door", "template_id":1, "obj_ids":[17562], "type_ids":[152] },{ "action":"close kitchen door", "template_id":1, "obj_ids":[17562], "type_ids":[152] }, ...

Here, the same action (close the greenhouse door) now has the three versions the parser recognizes enumerated: close door to greenhouse, close door, and close greenhouse door. They're not guaranteed to be unique (and, many are not, so if they're used the parser will go into ambiguity-resolution mode and ask the agent which objects they meant, which creates other complications for agents. But, this probably solves more problems than it creates.

It's not exhaustively tested yet -- we'll likely want to run it a bunch and make sure there isn't some environment that has a ton of objects that are nested deep in containers that have many possible referents, to make sure some part of it doesn't break. The string enumeration is an iterator, so if it breaks anywhere it might be on the py4j sending some (now) extremely long JSON string of all the valid actions for it to parse. It's possible running the random agents across all the variations might help it find these issues (and/or running the gold agents).

PeterAJansen commented 1 year ago

I'm working on testing it -- it looks like it now returns options that aren't recognized by the parser. I'll have a look tomorrow and see if I can figure out where the issue is. :)

PeterAJansen commented 1 year ago

It looks like the valid action generation was not respecting closed containers, so it would potentially generate possible valid actions involving items in closed containers. I've changed it to point to a function that just looks for visible objects, and so far I'm not seeing any red flags.

PeterAJansen commented 1 year ago

@MarcCote possibly related to this, I just fixed a bug in exhaustivevalidactions where items in the inventory were not enumerating in the valid actions list. I've made a bunch of changes to the branch, so I'm not sure if this same bug is present in the original branch or not.

https://github.com/allenai/ScienceWorld/commit/8813d398c464ee009564c0ce4c3e1b6f4ce82a9f

But, we should probably plan to move the changes from this branch to main soon, and make a new major release. The benefits of this branch are that it enumerates essentially all the valid action possibilities (with possible action verb aliases, and different possible referents for the objects), meaning that for e.g. LMs that align their generated action to a valid action, the performance will now be much better. But, the cost of this is that it can take a while (sometimes up to seconds) to generate this fantastically large set of actions, so the simulator is much slower. That's my only reservation right now.

MarcCote commented 8 months ago

@PeterAJansen do you think now is a good time to move the changes to the main branch?