allenai / discoveryworld

A virtual environment for developing and evaluating automated scientific discovery agents.
https://arxiv.org/abs/2406.06769
Apache License 2.0
23 stars 2 forks source link

Option to remove navigation/embodiment requirements for a task #4

Open PeterAJansen opened 3 weeks ago

PeterAJansen commented 3 weeks ago

We've had an external feature request on whether it's possible to remove the navigation/embodiment (e.g. object manipulation) aspects of tasks through setting a flag (e.g. embodiment = false) to try to distill the scientific discovery aspects of the tasks from other skills (even more so than the unit tests do).

Adding this as a feature request so we can start a thread on how this might be accomplished, since there are a number of implementation routes/challenges:

Some of the challenges here are:

Dandelionym commented 2 weeks ago

Dear author,

Really appreciate this excellent work! However, some feedbacks hope to be valuable.

  1. The benchmark has various actions for an agent to take. Those actions such as moving left, and right, are meaningless (at least I think it is not a good setting).
  2. An LLM with a long prompt and too much prompt engineering generally fails to finish the game.
  3. It is hard to load a novel algorithm if there is no clear API documentation and examples.
  4. The given random agent is useful but doesn't consider the feasibility of adapting to other cases, together with the prompt for location transition. (as I see the above issue)
  5. Hope the author can release the code example and modify the code base by removing the location transition. That would be helpful.

Thank you all for this excellent work. :-)

MarcCote commented 2 weeks ago

@Dandelionym thanks for sharing additional feedback. Do you have any idea on how best to remove the navigation action and still make sense in a multi-modal environment?

Even for pure text-based games (see ScienceWorld and TextWorld), a minimum of spatial navigation is needed. If we completely abstract it away, then it means all objects can be interacted with at all time, i.e. removing a big chunk of partial observability.

PeterAJansen commented 2 weeks ago

@Dandelionym thanks again -- as an additional follow-up re:API documentation and agents, do you have specific questions about the API documentation ( https://github.com/allenai/discoveryworld , Section 3: "API Documentation"), the agents other than the random agent (e.g. the LLM agent pseudocode scaffold in the API documentation, or the agents in the paper whose code is included in this repository at https://github.com/allenai/discoveryworld/tree/main/agents ) that we can help with?