anthropics / anthropic-quickstarts

A collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic API
MIT License
6.55k stars 904 forks source link

Many errors with tools being invoked with emptry-string args #137

Closed p-i- closed 1 week ago

p-i- commented 1 week ago

The attached screenshot demonstrates this happening with the bash tool, but it is happening with other tools as well.

This happens very frequently, and while it isn't a critical/crashing error, it is a nuisance, and I suspect could probably be solved by improving the schemas of the tools.

Screenshot 2024-10-29 at 15 30 00

Here's another:

Screenshot 2024-10-29 at 17 02 51
Quasimondo commented 1 week ago

I noticed that too - interestingly only since today and only when using the "computer" tool

p-i- commented 1 week ago

I noticed that too - interestingly only since today and only when using the "computer" tool

Nah they're everywhere -- thick as thieves! Just as I write:

Screenshot 2024-10-29 at 20 40 56

That's followed by 14 failed attempts:

Tool Use: bash    Input: {'command': ''}   (x8)
Tool Use: bash    Input: {}
Tool Use: bash    Input: {'restart': ''}
Tool Use: bash
Input: {'command': 'cd ~/dev/foo && git branch'}  # bash has exited with returncode 2
I apologize for the tool issues.
Quasimondo commented 1 week ago

Sorry to hear that - on my side so far it is only the "computer" actions that require parameters that fail, but it looks like this is really just an issue of Claude having an insufficient understanding of how it works. I was able to "fix" it by asking it to analyze the source code of computer.py and learn how each action needs to be properly called. According to its analysis the problem is a lack of concrete examples of how to call the actions in the system prompt.

This is Claude's introspection:

Looking at the source code, I can now see exactly how the tool is implemented and what are the correct parameter formats. Here are the key findings:

Parameter Types (lines 111-114): action: Action, text: str | None = None, coordinate: tuple[int, int] | None = None,

Action Requirements: For mouse_move and left_click_drag (lines 116-135):

REQUIRES coordinate parameter Must be a tuple/list of 2 non-negative integers Does NOT accept text parameter For key and type (lines 137-157):

REQUIRES text parameter Must be a string Does NOT accept coordinate parameter For type, includes automatic delays between keystrokes For other actions (lines 159-193):

"left_click" "right_click" "double_click" "middle_click" "screenshot" "cursor_position"

Do NOT accept either text or coordinate parameters Only require the action parameter Important Implementation Details: Screen scaling is implemented (lines 237-260) Screen resolution is set to 1024x768 (line 43) All mouse actions are synchronized (using --sync flag) Screenshots have a 2-second delay (line 76) Typing has a 12ms delay between keystrokes (line 18) This explains why some of my previous calls might have failed. For example:

Coordinates must be integers, not floats Parameters must be strictly matched to actions Coordinates can't exceed screen boundaries

Quasimondo commented 1 week ago

I added these lines to the SYSTEM_PROMPT and it seems to help:

<COMPUTER_TOOL>
* The computer tool requires exact parameter formatting:
  - Action: Required for all calls (one of: key, type, mouse_move, left_click, left_click_drag, right_click, middle_click, double_click, screenshot, cursor_position)
  - Text: Required only for "key" and "type" actions
  - Coordinate: Required only for "mouse_move" and "left_click_drag" actions, must be [x, y] array with integer values

* Example correct calls:
  - Screenshot: {{"action": "screenshot"}}
  - Mouse move: {{"action": "mouse_move", "coordinate": [100, 200]}}
  - Left click: {{"action": "left_click"}}
  - Type text: {{"action": "type", "text": "Hello world"}}
  - Press key: {{"action": "key", "text": "Return"}}

* Common mistakes to avoid:
  - Do not mix text and coordinate parameters
  - Do not use floats for coordinates
  - Do not provide unnecessary parameters
  - Do not exceed screen boundaries (1024x768)

* Best practices:
  - Take screenshot first to verify layout
  - Check cursor_position before moving
  - Use integers for all coordinates
  - Wait for actions to complete
  - Verify positions visually before clicking
</COMPUTER_TOOL>
x5a commented 1 week ago

This was an intermittent issue yesterday across the API caused by this incident.