allenai / ScienceWorld

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
https://sciworld.apps.allenai.org/
Apache License 2.0
199 stars 24 forks source link

Task IDs parity between paper and code #44

Closed MarcCote closed 1 year ago

MarcCote commented 1 year ago

Currently, there is a mismatch between task IDs mentioned in the paper and the ones used in the code.

For instance, in the code "Measurement" task has ID 10-*,


> ScienceWorldEnv("").getTaskNames()
['task-1-boil',
 'task-1-change-the-state-of-matter-of',
 'task-1-freeze',
 'task-1-melt',
 'task-10-measure-melting-point-(known-substance)',      <---Mismatch
 'task-10-measure-melting-point-(unknown-substance)',    <---Mismatch
 'task-10-use-thermometer',                              <---Mismatch
 'task-2-power-component',
 'task-2-power-component-(renewable-vs-nonrenewable-energy)',
 'task-2a-test-conductivity',
 'task-2a-test-conductivity-of-unknown-substances',
 'task-3-find-animal',
 'task-3-find-living-thing',
...

and in the paper it is ID 2 image

This PR addresses this issue by removing the prefix "task-ID-" from the task names. Also, this PR changes the env.load(...) function to accept a proper task ID to load a task, e.g. env.load("2-3").

Last, a Python script is available to list all task names, their IDs, and number of variations. python scripts/list_scienceworld_task.py

TASK LIST:
   1-1                                                boil      (  30 variations)
   1-2                                                melt      (  30 variations)
   1-3                                              freeze      (  30 variations)
   1-4                       change-the-state-of-matter-of      (  30 variations)
   2-1                                     use-thermometer      ( 540 variations)
   2-2             measure-melting-point-(known-substance)      ( 436 variations)
   2-3           measure-melting-point-(unknown-substance)      ( 300 variations)
   3-1                                     power-component      (  20 variations)
   3-2  power-component-(renewable-vs-nonrenewable-energy)      (  20 variations)
   3-3                                   test-conductivity      ( 900 variations)
   3-4             test-conductivity-of-unknown-substances      ( 600 variations)
   4-1                                   find-living-thing      ( 300 variations)
   4-2                               find-non-living-thing      ( 300 variations)
   4-3                                          find-plant      ( 300 variations)
   4-4                                         find-animal      ( 300 variations)
   5-1                                          grow-plant      ( 126 variations)
   5-2                                          grow-fruit      ( 126 variations)
   6-1                                       chemistry-mix      (  32 variations)
   6-2               chemistry-mix-paint-(secondary-color)      (  36 variations)
   6-3                chemistry-mix-paint-(tertiary-color)      (  36 variations)
   7-1                            lifespan-(longest-lived)      ( 125 variations)
   7-2                           lifespan-(shortest-lived)      ( 125 variations)
   7-3        lifespan-(longest-lived-then-shortest-lived)      ( 125 variations)
   8-1                              identify-life-stages-1      (  14 variations)
   8-2                              identify-life-stages-2      (  10 variations)
   9-1                      inclined-plane-determine-angle      ( 168 variations)
   9-2            inclined-plane-friction-(named-surfaces)      (1386 variations)
   9-3          inclined-plane-friction-(unnamed-surfaces)      ( 162 variations)
  10-1                    mendelian-genetics-(known-plant)      ( 120 variations)
  10-2                  mendelian-genetics-(unknown-plant)      ( 480 variations)
MarcCote commented 1 year ago

@aphedges hopefully this PR is not breaking anything on your side. You can still use the former task name convention. Let me know if that's not the case.

aphedges commented 1 year ago

@aphedges hopefully this PR is not breaking anything on your side. You can still use the former task name convention. Let me know if that's not the case.

It might break some stuff, but we have enough internal breakage that we already have scripts to do task name translation.

ISI is currently on break, so it won't be something we will be discussing until January.

MarcCote commented 1 year ago

Once I'm done with https://github.com/allenai/ScienceWorld/pull/37, there will be some breaking changes too. I plan to make ScienceWorld compatible with Gymnasium to handle vectorization.