Open ErikBjare opened 1 month ago
Tests completed | Failed | Passed | Skipped |
---|---|---|---|
80 | 7 | 73 | 0 |
tests.test_cli test_subagent
Stack Traces | 2.13s run time
> > ``` > args = ['--name', 'test-8882-test_subagent', 'test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`'] > runner = <click.testing.CliRunner object at 0x7fd39bb92ad0> > > @pytest.mark.slow > @pytest.mark.flaky(retries=2, delay=5) > def test_subagent(args: list[str], runner: CliRunner): > # f14: 377 > # f15: 610 > # f16: 987 > args.append( > "test the subagent tool by computing `fib(15)` with it, where `fib(1) = 1` and `fib(2) = 1`" > ) > print(f"running: gptme {' '.join(args)}") > result = runner.invoke(gptme.cli.main, args) > print(result.output) > > # apparently this is not obviously 610 > accepteds = ["377", "610"] > > assert any([accepted in result.output for accepted in accepteds]) > E assert False > E + where False = any([False, False]) > > .../gptme/tests/test_cli.py:322: AssertionError > ```tests.test_cli test_generate_primes
Stack Traces | 2.29s run time
> > ``` > args = ['--name', 'test-27503-test_generate_primes', 'compute the first 10 prime numbers'] > runner = <click.testing.CliRunner object at 0x7f3a0640fa00> > > @pytest.mark.slow > def test_generate_primes(args: list[str], runner: CliRunner): > args.append("compute the first 10 prime numbers") > result = runner.invoke(gptme.cli.main, args) > # check that the 9th and 10th prime is present > > assert "23" in result.output > E assert '23' in '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' > E + where '[17:24:50] Using logdir \n .../gptme/logs/test-27503-test_generate_primes \n Using workspace at /tmp/tmpq3_fmlrr \nUser: compute the first 10 prime numbers\nAssistant: Here is how we can compute the first 10 prime numbers in Python:\nUser: compute the first 10 prime numbers\nAssistant: \nSkipped 1 hidden system messages, show with --show-hidden\n--- ^^^ past messages ^^^ ---\nUser: compute the first 10 prime numbers\nAssistant: Thinking...\r \r' = <Result BadRequestError("Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'messages: text content blocks must be non-empty'}}")>.output > > .../gptme/tests/test_cli.py:252: AssertionError > ```
To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard
Hey @ErikBjare, are you planning to continue this soon? If not, would you be okay with me picking it up (on top of your changes obviously)? I believe this MR could significantly improve model performance, especially for smaller ones and prevent some random small bugs. :pray:
@jrmi I'm kinda working on other stuff right now. No immediate plans to pick this up.
Feel free to give it a shot!
Made an attempt at https://github.com/ErikBjare/gptme/issues/218, was pretty easy to get this minimal thing working.
should we register ipython-functions too?leave it to the ipython toolWould improve system prompt adherence, fixing issues like: