allenai / ScienceWorld

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
https://sciworld.apps.allenai.org/
Apache License 2.0
199 stars 24 forks source link

Is it possible to generate visual observation corresponding to the textual observation? #68

Closed stevenyangyj closed 1 month ago

stevenyangyj commented 5 months ago

Thanks for your help!

MarcCote commented 3 months ago

@PeterAJansen can correct me if I'm wrong, at the moment there's not easy way to generate a visual observation from the textual description.

PeterAJansen commented 1 month ago

Sorry to be slow responding to this one. I can't think of an easy way of doing this in ScienceWorld, but we have just released DiscoveryWorld that includes both text (i.e. JSON) and proper visual (2D tile-based) observations. Here's the link for DiscoveryWorld: https://github.com/allenai/discoveryworld

If it /must/ be ScienceWorld that has the visual observations, the potential ways forward that come to mind are:

  1. Manually build a visualizer that has every object in it. This is similar to how DiscoveryWorld works, and the various Nethack/etc visualizers.
  2. Try to put together something automatic with Dall-E/etc., that takes the text observation as input, and produces an image as output, with some bias to whatever past generations looked like to try and make it somewhat consistent across steps.