[Feature]: Retry on failure functionality

neubig commented 3 months ago

What problem or use case are you trying to solve?

Sometimes models fail to do their job correctly, and we would benefit from starting all over from the beginning. There are a few examples of this in the agent literature:

Aider recently introduced a harness for testing SWE-bench that allows for retries when tests and linting don't pass on swe-bench
@Jiayi-Pan has work on Evaluation and Refinement for web agents that use a reward model to judge when a web task has failed, a reset mechanism to return to the beginning, and a method for improving the prompts based on reflexion.
@niansong1996 has a method LEVER that uses a learned verifier to rerank code generation execution results.

Describe the UX of the solution you'd like

Ideally, this would be something that could be implemented in a general way, so that we could implement different strategies with a shared interface. For instance:

class ResetStrategy:

   @abstractmethod
   def initialize_state():
       """Take note of the initial state that should be reset too."""
       ...

   @abstractmethod
   def verify(...):
      """This verifies whether the agent has reached a failure state."""
      ...

   @abstractmethod
   def reset(...):
      """This performs some sort of reset."""
      ...

   @abstractmethod
   def message_on_reset(...):
      """This creates a message to the agent upon reset (e.g. a task with a prompt based on reflexion)."""
      ...

Then, when using OpenDevin, we could choose an option that says "retry N times when you get stuck", and select the strategy that is used to do so.

Do you have thoughts on the technical implementation?

The actual reset strategies would vary based on the task. For instance:

AiderResetStrategy (code reference):
- initialize: save the current git commit of the repository commit_id
- verify: tests+linting pass
- reset: git checkout commit_id
- message_on_failure: no-op
EvalRefineResetStrategy (code reference):
- initialize: save the current web page initial_page
- verify: the reward model is positive
- reset: goto(initial_page)
- message_on_failure: reflexion prompt

This could either be integrated into OpenDevin, allowing for retries in the main app as well

Additional context:

Aider superissue
Thanks to @xingyaoww and @frankxu2004 for offline discussion

Jiayi-Pan commented 3 months ago

Thanks for creating the issue! Although I don’t have much spare bandwidth recently, I am definitely interested in bringing EvalRefineResetStrategy and the retry functionality into OpenDevin. I will keep an eye on this PR and contribute once I have the time.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 22 hours ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

All-Hands-AI / OpenHands

[Feature]: Retry on failure functionality #2221