allenai / ScienceWorld

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
https://sciworld.apps.allenai.org/
Apache License 2.0
199 stars 24 forks source link

Task 5 data is missing in gold-paths-all.zip #49

Closed yuchenlin closed 1 year ago

yuchenlin commented 1 year ago

In the goldpaths-all.zip the data for Task 5 is not included. Is this expected? Thanks!

PeterAJansen commented 1 year ago

Thanks @yuchenlin for this issue report. It looks like indeed the gold path generator didn't have Task 5 included in its list. During development I'd generate these in small batches for just a few tasks at a time (since the whole run for all tasks takes a long time), and it looks like when I added all the tasks in a big list together to run overnight I accidentally left Task 5 out.

I've just pushed a fix in the branch I've been working on, which will eventually get merged into the main branch:

val specificTasks = Array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29)           // Do specific tasks

In the interim I've generated just the paths for Task 5 (which ran in just a few minutes), and added them to the goldpaths-all.zip file on that branch:

https://github.com/allenai/ScienceWorld/tree/exhaustivevalidactions/goldpaths

yuchenlin commented 1 year ago

Thank you so much!

yuchenlin commented 1 year ago

Hi Peter,

Sorry to bother you again. I found that actually the ids in the filname and the ids in the current SW env does not match. So the missing task 5 is actually the current Task 21 --> Task 2-3 measure-melting-point-unknown-substance and the one in your new file of Task 5 is find-animal which is the Task 11 in the old mappings, and it was already included in the old json file.

Simply put, would you please generate the gold paths for the task "Task 2-3 measure-melting-point-unknown-substance"? Thank you very much!

MarcCote commented 1 year ago

Just in case you didn't know, there is a flag you can set when calling env.load(taskName, variationIdx, generateGoldPath=True) to ask ScienceWorld to generate the gold path for a given task and variation.

PeterAJansen commented 1 year ago

goldsequences-21.zip

@yuchenlin Here's the regenerated paths for Task 21 (measure-melting-point-unknown-substance).

I think it's probably time to regenerate all the paths, so I've set those running, it just may take a few days.

Though -- I just realized that I'm operating on the exhaustivevalidactions branch, which has a number of small fixes (particularly with enumerating the full valid action space, with all the aliases for action verbs, and most of the possible referent names). This might affect the paths somewhat. I'd recommend using that new branch if you're able (since it will likely be merged into master under a new release shortly). But if you would like the Task 21 paths using the current release, just let me know and I can redo those fairly quickly.

PeterAJansen commented 1 year ago

Actually it only took a day -- here's the full set of gold paths, regenerated on the current exhaustivevalidactions branch.

goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.zip

The new goldpaths-all.zip has been committed to that branch: https://github.com/allenai/ScienceWorld/commit/6ab99ab9864e29d984e9b1f0ee5528725655bbb1

And (just for interest) here's the report, the gold agents solve 99.7% of the task variations:

Simplifications: noElectricalAction, openDoors, selfWateringFlowerPots, teleportAction
---------------------------------
Total number of variations tested: 7207
Total number of variations with errors in gold path: 24
---------------------------------
  0:                                                         boil   min:  0.78      max:  1.00      avg:  0.99      
  1:                                change-the-state-of-matter-of   min:  1.00      max:  1.00      avg:  1.00      
  2:                                                chemistry-mix   min:  1.00      max:  1.00      avg:  1.00      
  3:                          chemistry-mix-paint-secondary-color   min:  1.00      max:  1.00      avg:  1.00      
  4:                           chemistry-mix-paint-tertiary-color   min:  1.00      max:  1.00      avg:  1.00      
  5:                                                  find-animal   min:  1.00      max:  1.00      avg:  1.00      
  6:                                            find-living-thing   min:  1.00      max:  1.00      avg:  1.00      
  7:                                        find-non-living-thing   min:  1.00      max:  1.00      avg:  1.00      
  8:                                                   find-plant   min:  1.00      max:  1.00      avg:  1.00      
  9:                                                       freeze   min:  0.82      max:  1.00      avg:  0.99      
 10:                                                   grow-fruit   min:  0.46      max:  1.00      avg:  0.98      
 11:                                                   grow-plant   min:  1.00      max:  1.00      avg:  1.00      
 12:                                       identify-life-stages-1   min:  1.00      max:  1.00      avg:  1.00      
 13:                                       identify-life-stages-2   min:  1.00      max:  1.00      avg:  1.00      
 14:                               inclined-plane-determine-angle   min:  1.00      max:  1.00      avg:  1.00      
 15:                       inclined-plane-friction-named-surfaces   min:  1.00      max:  1.00      avg:  1.00      
 16:                     inclined-plane-friction-unnamed-surfaces   min:  1.00      max:  1.00      avg:  1.00      
 17:                                       lifespan-longest-lived   min:  1.00      max:  1.00      avg:  1.00      
 18:                   lifespan-longest-lived-then-shortest-lived   min:  1.00      max:  1.00      avg:  1.00      
 19:                                      lifespan-shortest-lived   min:  1.00      max:  1.00      avg:  1.00      
 20:                        measure-melting-point-known-substance   min: -1.00      max:  1.00      avg:  1.00      
 21:                      measure-melting-point-unknown-substance   min: -1.00      max:  1.00      avg:  0.99      
 22:                                                         melt   min:  1.00      max:  1.00      avg:  1.00      
 23:                               mendelian-genetics-known-plant   min:  1.00      max:  1.00      avg:  1.00      
 24:                             mendelian-genetics-unknown-plant   min:  1.00      max:  1.00      avg:  1.00      
 25:                                              power-component   min:  1.00      max:  1.00      avg:  1.00      
 26:             power-component-renewable-vs-nonrenewable-energy   min:  1.00      max:  1.00      avg:  1.00      
 27:                                            test-conductivity   min: -1.00      max:  1.00      avg:  0.99      
 28:                      test-conductivity-of-unknown-substances   min:  1.00      max:  1.00      avg:  1.00      
 29:                                              use-thermometer   min: -1.00      max:  1.00      avg:  0.99      
---------------------------------
Exporting gold action sequences...
Exporting gold action sequences... (goldsequences-0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29.json)
 * Task 0 (variations: 30
 * Task 1 (variations: 30
 * Task 2 (variations: 32
 * Task 3 (variations: 36
 * Task 4 (variations: 36
 * Task 5 (variations: 300
 * Task 6 (variations: 300
 * Task 7 (variations: 300
 * Task 8 (variations: 300
 * Task 9 (variations: 30
 * Task 10 (variations: 126
 * Task 11 (variations: 126
 * Task 12 (variations: 14
 * Task 13 (variations: 10
 * Task 14 (variations: 168
 * Task 15 (variations: 1386
 * Task 16 (variations: 162
 * Task 17 (variations: 125
 * Task 18 (variations: 125
 * Task 19 (variations: 125
 * Task 20 (variations: 436
 * Task 21 (variations: 300
 * Task 22 (variations: 30
 * Task 23 (variations: 120
 * Task 24 (variations: 480
 * Task 25 (variations: 20
 * Task 26 (variations: 20
 * Task 27 (variations: 900
 * Task 28 (variations: 600
 * Task 29 (variations: 540
Completed...
MarcCote commented 1 year ago

@PeterAJansen can you push the script used to generate that data? Also, I'm thinking we should use the new task ID (i.e. 3-1).

PeterAJansen commented 1 year ago

We definitely should modify it to use the new task IDs!

The code to generate the gold paths is already in the repo, just poorly named :-/ . The critical bit is the specificTasks line at the top, which is currently a list of all task numbers (0-30), but that we could change to a list of the task IDs (then make sure the call to loading the environment has the string instead of int signature -- though I forget if that's done on the Python or Scala side).

https://github.com/allenai/ScienceWorld/blob/main/simulator/src/main/scala/scienceworld/goldagent/ExampleGoldAgent.scala