All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
31.38k stars 3.62k forks source link

[Bug]: Issues w/ CodeAct + gpt-4-turbo #1710

Closed vedtam closed 4 months ago

vedtam commented 4 months ago

Is there an existing issue for the same bug?

Describe the bug

Hi All,

Thanks for your initiative. Please forgive my grumpy title and comment but despite sharing your vision, I was unable to produce any useful code after trying for almost an hour. I've tried gpt-3.5-turbo, gpt-4-turbo, what I get is mostly an endless loop where the model is unaware of the context. It keeps giving back instructions - really verbose ones burning trough my tokens without any actual solutions. Frequently the conversation gets stuck in an endless loop while the pause / stop button is not responding, I had to kill docker in order to terminate it.

Just so many issues, how could this have ended up in stable?

Current Version

0.5.2

Installation and Configuration

Using the installation steps from the docs.

Model and Agent

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

SmartManoj commented 4 months ago

Could you provide any logs?

rbren commented 4 months ago

This is a bit strange--CodeAct + gpt-4-turbo works surprisingly well for me.

Can you give some examples of tasks you tried?

rbren commented 4 months ago

Also--are you sure on 0.5.5? We haven't released past 0.5.2 😄

And worth noting that nothing is stable right now--we're still on 0.x, hoping to get a 1.0 out in the next couple months

vedtam commented 4 months ago

Sorry, I've used 0.5 initially (copy/pasted the docker command from the docs), then switched to 0.5.2 but the behavior remained the same.

I've created a test.py file in the workspace, copied a relatively simple function from my project and added a TODO comment after te function, describing a new function that needs to be implemented.

Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"

It found the comment in the file, but instead of extending the script, it just gave a long explanation in the conversation along example code that "I'm supposed to implement".

At the second try, it entered an infinite loop repeating the same instructions, while the terminal complained about indentation error.

I will repeat the exercise and evetually copy the output, but if you reproduce the above scenario it should be quite close to what i did.

rbren commented 4 months ago

I wonder if this prompt is the issue:

Then in the conversation I've propted: "open the test.py file from the workspace and follow the TODO comment describing your task"

IIUC the internal prompt refers to that text as its task. So the LLM might be confused as to what "task" refers to

SmartManoj commented 4 months ago

@vedtam

At the second try

1) Did you mean the next step after you asked to implement it? or 2) from Step 0 again?

vedtam commented 4 months ago

Will be back to my office and share the source file along the prompt as soon as possible.

vedtam commented 4 months ago

In this new run, it's not far from completing the task, but as you can see it asks for going in and manually fixing the indentation then it suggests what the finished code could look like:

test.py

from together import Together
from typing import Dict, Optional
from openai import OpenAI
import sentry_sdk
import json

def llamaChat(self, system_prompt: str, user_prompt: str)-> Optional[Dict]:
  messages = [{
    'role': 'system',
    'content': system_prompt
    },
    {
    'role': 'user',
    'content': user_prompt
  }]

  client = Together(api_key='19c1ada054415d7ad3c51809XXXX-XXXX')
  response = client.chat.completions.create(
      model='meta-llama/Llama-3-8b-chat-hf',
      messages=messages,
  )
  content = response.choices[0].message.content

  try:
    return json.loads(content)
  except Exception as e:
    data = llmJsonFix(content)

    if data:
      return data
    else:
      sentry_sdk.capture_message(f'Error parsing JSON in answer! Content: {content}, Exception: {e}')
      return None

# TODO: Implement the function llmJsonFix. This will use the openai library
# with the model set to "gpt-3.5-turbo" and fix the JSON dictionary in Llama's
# response if incorrect. llmJsonFix will attempt to repairing the JSON object
# no more then 3 times before returning None!
def llmJsonFix(content: str)-> Optional[Dict]:
  # to be implemented
  return None

prompt

open the test.py file from the workspace and follow the TODO comment describing the task you need to complete.

log

log.txt

screenshot

Screenshot 2024-05-11 at 20 59 46

vedtam commented 4 months ago

Just to note, when switching back to gpt-3.5-turbo (the above uses gpt4), it stops at STEP 1 entirely:

Screenshot 2024-05-11 at 21 09 46

I'm wondering, are there some example projects I could try to reproduce and eventually adapt my prompting style for better results?

reteps commented 4 months ago

As a random bystander to this, @vedtam it looks like the tooling only supports 4-space tabs. "E111 indentation is not a multiple of 4". Maybe you can disable that warning, or indent your file differently?

SmartManoj commented 4 months ago

@vedtam, Your file contains 2 spaces as indentation. Why?

vedtam commented 4 months ago

I prefer 2 spaces. But the number of spaces shouldn't really matter.

SmartManoj commented 4 months ago

Violates PEP8. What is the reason for your preference?

https://peps.python.org/pep-0008/#indentation

Also, in the todo, could you mention your indentation style and run?

vedtam commented 4 months ago

I'm mostly writing JavaScript, where I use 2. Guides are good, especially when working in a team. But Python should run with any number of spaces.

Practically, any of the issues mentioned by me in this thread can be solved, I'm already looking at the source. I just wanted to highlight some I've met during my first encounter of the library, so other's know about and eventually look into.

SmartManoj commented 4 months ago

Here E111 https://github.com/OpenDevin/OpenDevin/blob/9ccf17a63bcf2667c82b8de0e789e8075bf59137/opendevin/runtime/plugins/swe_agent_commands/cursors_edit_linting.sh#L40

SmartManoj commented 4 months ago

I'm mostly writing JavaScript, where I use 2

Will you press one tab or 2 spaces?

vedtam commented 4 months ago

One tab. Oh, I see, there a linter, that makes sense.

rbren commented 4 months ago

Oh interesting. I actually dislike that we're shoving that opinion into CodeAct--I'm going to file a separate ticket for it.

I don't think there's much for us to solve in this issue--seems like a tricky prompt!--so I'm going to close it