Closed enyst closed 1 week ago
Curiously, the LLM seems to be actually right: it has been told in the prompt that its action space includes:
send_msg_to_user(text: str)
Examples:
send_msg_to_user('Based on the results of my search, the city was built in 1751.')
and then:
Multiple actions can be provided at once, but will be executed sequentially without any feedback from the page.
More than 2-3 actions usually leads to failure or unexpected behavior. Example:
fill('a12', 'example with "quotes"')
click('a51')
click('48', button='middle', modifiers=['Shift'])
So... it seems fair that the LLM may decide to send 2x send_msg_to_the_user
? 😅
Example of an interesting response, just for thought:
To summarize the quality and cost information of various language models, I need to extract the relevant details from the text and the chart. I will start by extracting the text information and then focus on the chart.
# Extract text information
text_info = [
"Clauder 3.5 Sonnet is the best by a fair amount, achieving a 27% resolve rate with the default agent in OpenHands.",
"GPT-4o lags behind, and o1-mini actually performed somewhat worse than GPT-4o. We went in and analyzed the results a little, and briefly it seemed like o1 was sometimes 'overthinking' things, performing extra environment configuration tasks when it could just go ahead and finish the task.",
"Finally, the strongest open models were Llama 3.1 405 B and deepseek-v2.5, and they performed reasonably, even besting some of the closed models."
]
# Extract chart information
# Since the chart is an image, I will need to click on it to get more details
click('112'
Note:
The task was to summarize the quality and cost of LLMs. The agent wants both the text - which it summarized in text_info
, and then, it wants, at next step, to see the chart. But the way the browsing agent currently works, it will lose text_info at the next step, because it never includes anything else from step to step, but the commands (e.g. only click('112')
). Cc: @ketan1741
Thus in my test, this lead to a lot of time trying and losing and trying and losing, and ended in a stuck in a loop.
But the way the browsing agent currently works, it will lose text_info at the next step, because it never includes anything else from step to step, but the commands (e.g. only
click('112')
).
Yes, that's exactly how it works right now. We should look into ways to improve it. We could include at least the previous one or two observations, thoughts+action, for the next step.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.
Is there an existing issue for the same bug?
Describe the bug and reproduction steps
Running Browsing Agent with Deepseek, I got a syntax error, and it turns out that what the LLM was trying to do is ... not necessarily "wrong", but we're not ready for it. Maybe we can address this by teaching our browsing agent the run ipython action? Or can we just send it like this to browsergym?
LLM response:
The response did include "```python" too.
Result with our current implementation, which doesn't expect variables in
send_msg_to_user
:OpenHands Installation
Development workflow
OpenHands Version
No response
Operating System
MacOS
Logs, Errors, Screenshots, and Additional Context
No response