feat: implemented use of OpenAI's `tools` API

ErikBjare commented 1 month ago

Made an attempt at https://github.com/ErikBjare/gptme/issues/218, was pretty easy to get this minimal thing working.

[x] get it working for save/append
[x] get it working for patch
[x] get it working for ipython
[x] ~~should we register ipython-functions too?~~ leave it to the ipython tool
[ ] fix parameter schemas
[ ] clean up examples prompting
[ ] get it working for anthropic
[ ] test it with openrouter/ollama with Llama 1B/3B to see if behavior is improved
- Addressing #195

Would improve system prompt adherence, fixing issues like:

[!IMPORTANT] Adds support for OpenAI's Tools API, integrating tool usage into chat flow with tool execution and response handling.

Behavior:

Adds support for OpenAI's Tools API in step() in chat.py and reply() in llm.py.

Parses tool call responses in step() using regex to extract tool name and arguments.

Integrates tool execution in chat flow, yielding tool execution results.

Functions:

Adds build_tools_dict() and spec2tool() in llm.py to construct tool specifications.

Modifies reply(), _chat_complete(), and _stream() in llm.py to accept and handle tools.

Misc:

Updates chat() and stream() in llm_openai.py and llm_anthropic.py to pass tools to API calls.

Adds tools parameter to chat() and stream() functions in llm_openai.py and llm_anthropic.py.

Updates toolcall_re regex in tools/base.py to parse tool calls.

^{This description was created by}^{for 4a11c8b7130750b3d017dc9d0419805dfd61f66b. It will automatically update as commits are pushed.}

codecov-commenter commented 1 month ago

:x: 7 Tests Failed:

Tests completed	Failed	Passed	Skipped
80	7	73	0

View the top 3 failed tests by shortest run time

> > ``` > tests.test_server test_api_conversation_generate > ``` > >

Stack Traces | 0.216s run time

> > > > > ``` > > conv = 'test-server-807350', client = <FlaskClient <Flask 'gptme.server.api'>> > > > > @pytest.mark.slow > > def test_api_conversation_generate(conv: str, client: FlaskClient): > > # Ask the assistant to generate a test response > > response = client.post( > > f"/api/conversations/{conv}", > > json={"role": "user", "content": "hello, just testing"}, > > ) > > assert response.status_code == 200 > > > > response = client.post( > > f"/api/conversations/{conv}/generate", > > json={"model": get_model().model}, > > ) > > > assert response.status_code == 200 > > E assert 500 == 200 > > E + where 500 = <WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]>.status_code > > > > tests/test_server.py:78: AssertionError > > ``` > >

tests.test_cli test_subagent
Stack Traces | 2.13s run time
> > ``` > args = ['--name', 'test-8882-test_subagent', 'test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`'] > runner = <click.testing.CliRunner object at 0x7fd39bb92ad0> > > @pytest.mark.slow > @pytest.mark.flaky(retries=2, delay=5) > def test_subagent(args: list[str], runner: CliRunner): > # f14: 377 > # f15: 610 > # f16: 987 > args.append( > "test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`" > ) > print(f"running: gptme {' '.join(args)}") > result = runner.invoke(gptme.cli.main, args) > print(result.output) > > # apparently this is not obviously 610 > accepteds = ["377", "610"] > > assert any([accepted in result.output for accepted in accepteds]) > E assert False > E + where False = any([False, False]) > > .../gptme/tests/test_cli.py:322: AssertionError > ```
tests.test_cli test_generate_primes
Stack Traces | 2.29s run time
> > ``` > args = ['--name', 'test-27503-test_generate_primes', 'compute the first 10 prime numbers'] > runner = <click.testing.CliRunner object at 0x7f3a0640fa00> > > @pytest.mark.slow > def test_generate_primes(args: list[str], runner: CliRunner): > args.append("compute the first 10 prime numbers") > result = runner.invoke(gptme.cli.main, args) > # check that the 9th and 10th prime is present > > assert "23" in result.output > E assert '23' in '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' > E + where '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' = <Result BadRequestError("Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: text content blocks must be non-empty'}}")>.output > > .../gptme/tests/test_cli.py:252: AssertionError > ```

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

jrmi commented 3 days ago

Hey @ErikBjare, are you planning to continue this soon? If not, would you be okay with me picking it up (on top of your changes obviously)? I believe this MR could significantly improve model performance, especially for smaller ones and prevent some random small bugs. :pray:

ErikBjare commented 3 days ago

@jrmi I'm kinda working on other stuff right now. No immediate plans to pick this up.

Feel free to give it a shot!

ErikBjare / gptme

feat: implemented use of OpenAI's `tools` API #219

:x: 7 Tests Failed: