Is it possible to generate visual observation corresponding to the textual observation?

stevenyangyj commented 5 months ago

Thanks for your help!

MarcCote commented 3 months ago

@PeterAJansen can correct me if I'm wrong, at the moment there's not easy way to generate a visual observation from the textual description.

PeterAJansen commented 1 month ago

Sorry to be slow responding to this one. I can't think of an easy way of doing this in ScienceWorld, but we have just released DiscoveryWorld that includes both text (i.e. JSON) and proper visual (2D tile-based) observations. Here's the link for DiscoveryWorld: https://github.com/allenai/discoveryworld

If it /must/ be ScienceWorld that has the visual observations, the potential ways forward that come to mind are:

Manually build a visualizer that has every object in it. This is similar to how DiscoveryWorld works, and the various Nethack/etc visualizers.
Try to put together something automatic with Dall-E/etc., that takes the text observation as input, and produces an image as output, with some bias to whatever past generations looked like to try and make it somewhat consistent across steps.

allenai / ScienceWorld

Is it possible to generate visual observation corresponding to the textual observation? #68