[Question] How goldpaths were created?

yukioichida commented 1 year ago

Hi,

I have a question about how ScienceWorld gold paths were formulated. Are they optimal plans regarding plan size/time, or were they created from crawled trajectory similar to TextWorldExpress?

For instance, I have examined variation 12 of task 1 (boil marshmallow), and the gold plan has ~147 steps, but an agent can deal with this problem using fewer steps leveraging waiting operations to reduce the number of examinations.

Please, let me know if I have missed something.

MarcCote commented 1 year ago

@PeterAJansen can add more details but the ScienceWorld's goldpath were generated using rule-based oracle agents. Even though, they have access to privileged state information, they are not optimal in term of length but they are generic enough to handle all task variations for any given task.

yukioichida commented 1 year ago

Ok, thanks for the answer.

yuchenlin commented 1 year ago

@PeterAJansen can add more details but the ScienceWorld's goldpath were generated using rule-based oracle agents. Even though, they have access to privileged state information, they are not optimal in term of length but they are generic enough to handle all task variations for any given task.

hey @PeterAJansen and @MarcCote , would you please share the code for this rule-based agent that generates oracle data? thank you!

PeterAJansen commented 1 year ago

Hey @yuchenlin , the agents are indeed bots that solve the problems in (it's hoped) fairly generic and learnable ways. The code for each agent is included in the code for the tasks -- for example, the simplest tasks (the find living thing/non-living thing/etc.) have their agent as follows:

https://github.com/allenai/ScienceWorld/blob/7851d727c7671da154f9a3e51e9129fe6a0dd4ba/simulator/src/main/scala/scienceworld/tasks/specifictasks/TaskFindLivingNonLiving.scala#L241

More generally, the function that generates the gold action sequence (i.e. calls the gold agent bot) for a given task is in the mkGoldActionSequence() function within each Task:

https://github.com/allenai/ScienceWorld/blob/7851d727c7671da154f9a3e51e9129fe6a0dd4ba/simulator/src/main/scala/scienceworld/tasks/specifictasks/TaskFindLivingNonLiving.scala#L157

yuchenlin commented 1 year ago

Thank you so much for your prompt reply! :D

PeterAJansen commented 1 year ago

@yukioichida Apologies to take a long while to reply, I may have been travelling when this was initially opened.

For the heating/cooling tasks, much of the time the agent first tries one device (e.g. the stove), then tries other devices (e.g. the blast furnace) if it's broken -- so there are typically back-off strategies built into the bots to handle the ablations. IIRC, when building the bots, I think I favoured having them explicitly monitor an object (e.g. something waiting to boil) rather than use the time-skipping wait commands, which is why some of the paths may look longer than they (strictly speaking) need to be.

For nearly all gold agent generated paths, there are very likely shorter paths to be had. The goal with the gold agents was to provide bots that provide generalizable training data, so that an agent trained using them might learn the idea of a search through the environment (rather than simply going to the gold location), might learn to monitor the changes in an object (rather than just skipping ahead in time), etc.

yukioichida commented 1 year ago

Hi @PeterAJansen

No problem at all, thank you for all the answers.

allenai / ScienceWorld

[Question] How goldpaths were created? #40