ErikBjare / gptme

Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.
https://gptme.org/docs/
MIT License
2.64k stars 180 forks source link

feat: implemented use of OpenAI's `tools` API #219

Open ErikBjare opened 1 month ago

ErikBjare commented 1 month ago

Made an attempt at https://github.com/ErikBjare/gptme/issues/218, was pretty easy to get this minimal thing working.

Would improve system prompt adherence, fixing issues like:


[!IMPORTANT] Adds support for OpenAI's Tools API, integrating tool usage into chat flow with tool execution and response handling.

  • Behavior:
    • Adds support for OpenAI's Tools API in step() in chat.py and reply() in llm.py.
    • Parses tool call responses in step() using regex to extract tool name and arguments.
    • Integrates tool execution in chat flow, yielding tool execution results.
  • Functions:
    • Adds build_tools_dict() and spec2tool() in llm.py to construct tool specifications.
    • Modifies reply(), _chat_complete(), and _stream() in llm.py to accept and handle tools.
  • Misc:
    • Updates chat() and stream() in llm_openai.py and llm_anthropic.py to pass tools to API calls.
    • Adds tools parameter to chat() and stream() functions in llm_openai.py and llm_anthropic.py.
    • Updates toolcall_re regex in tools/base.py to parse tool calls.

This description was created by Ellipsis for 4a11c8b7130750b3d017dc9d0419805dfd61f66b. It will automatically update as commits are pushed.

codecov-commenter commented 1 month ago

:x: 7 Tests Failed:

Tests completed Failed Passed Skipped
80 7 73 0
View the top 3 failed tests by shortest run time > > ``` > tests.test_server test_api_conversation_generate > ``` > >
Stack Traces | 0.216s run time > > > > > ``` > > conv = 'test-server-807350', client = <FlaskClient <Flask 'gptme.server.api'>> > > > > @pytest.mark.slow > > def test_api_conversation_generate(conv: str, client: FlaskClient): > > # Ask the assistant to generate a test response > > response = client.post( > > f"/api/conversations/{conv}", > > json={"role": "user", "content": "hello, just testing"}, > > ) > > assert response.status_code == 200 > > > > response = client.post( > > f"/api/conversations/{conv}/generate", > > json={"model": get_model().model}, > > ) > > > assert response.status_code == 200 > > E assert 500 == 200 > > E + where 500 = <WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]>.status_code > > > > tests/test_server.py:78: AssertionError > > ``` > >
tests.test_cli test_subagent
Stack Traces | 2.13s run time > > ``` > args = ['--name', 'test-8882-test_subagent', 'test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`'] > runner = <click.testing.CliRunner object at 0x7fd39bb92ad0> > > @pytest.mark.slow > @pytest.mark.flaky(retries=2, delay=5) > def test_subagent(args: list[str], runner: CliRunner): > # f14: 377 > # f15: 610 > # f16: 987 > args.append( > "test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`" > ) > print(f"running: gptme {' '.join(args)}") > result = runner.invoke(gptme.cli.main, args) > print(result.output) > > # apparently this is not obviously 610 > accepteds = ["377", "610"] > > assert any([accepted in result.output for accepted in accepteds]) > E assert False > E + where False = any([False, False]) > > .../gptme/tests/test_cli.py:322: AssertionError > ```
tests.test_cli test_generate_primes
Stack Traces | 2.29s run time > > ``` > args = ['--name', 'test-27503-test_generate_primes', 'compute the first 10 prime numbers'] > runner = <click.testing.CliRunner object at 0x7f3a0640fa00> > > @pytest.mark.slow > def test_generate_primes(args: list[str], runner: CliRunner): > args.append("compute the first 10 prime numbers") > result = runner.invoke(gptme.cli.main, args) > # check that the 9th and 10th prime is present > > assert "23" in result.output > E assert '23' in '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' > E + where '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' = <Result BadRequestError("Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: text content blocks must be non-empty'}}")>.output > > .../gptme/tests/test_cli.py:252: AssertionError > ```

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

jrmi commented 3 days ago

Hey @ErikBjare, are you planning to continue this soon? If not, would you be okay with me picking it up (on top of your changes obviously)? I believe this MR could significantly improve model performance, especially for smaller ones and prevent some random small bugs. :pray:

ErikBjare commented 3 days ago

@jrmi I'm kinda working on other stuff right now. No immediate plans to pick this up.

Feel free to give it a shot!