refactor: refactor tools, codeblock, and tooluse

ErikBjare commented 3 weeks ago

Trying to reduce the size of gptme/tools/__init__.py

TODO

[x] check that things still work somewhat
[x] make sure docs look ok after

ellipsis-dev[bot] commented 3 weeks ago

Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev

codecov-commenter commented 3 weeks ago

:x: 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
71	1	70	0

View the top 1 failed tests by shortest run time

> > ``` > tests.test_eval test_eval_cli > ``` > >

Stack Traces | 31.3s run time

> > > > > ``` > > @pytest.mark.slow > > def test_eval_cli(): > > runner = CliRunner() > > test_set = ["hello"] > > result = runner.invoke( > > main, > > [ > > *test_set, > > ], > > ) > > assert result > > assert result.exit_code == 0 > > > assert "correct file" in result.output > > E AssertionError: assert 'correct file' in '=== Running evals ===\n=== Completed test hello ===\n=== Completed test hello ===\n=== Completed test hello ===\n\n=== Finished ===\n\n\n\n=== Model Results ===\n\nResults for model: openai/gpt-4o\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\nResults for model: anthropic/claude-3-5-sonnet-20240620\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\nResults for model: openrouter/meta-llama/llama-3.1-405b-instruct\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\n\n=== Model Comparison ===\nModel hello\n--------------------------------------------- --------\nopenai/gpt-4o ❌ 0.00s\nanthropic/claude-3-5-sonnet-20240620 ❌ 0.00s\nopenrouter/meta-llama/llama-3.1-405b-instruct ❌ 0.00s\n\nResults saved to .../gptme/eval_results/eval_results_20240909_202435.csv\n' > > E + where '=== Running evals ===\n=== Completed test hello ===\n=== Completed test hello ===\n=== Completed test hello ===\n\n=== Finished ===\n\n\n\n=== Model Results ===\n\nResults for model: openai/gpt-4o\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\nResults for model: anthropic/claude-3-5-sonnet-20240620\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\nResults for model: openrouter/meta-llama/llama-3.1-405b-instruct\nCompleted 1 tests in 0.00s:\n- hello in 0.00s (gen: 0.00s, run: 0.00s, eval: 0.00s)\n\n\n=== Model Comparison ===\nModel hello\n--------------------------------------------- --------\nopenai/gpt-4o ❌ 0.00s\nanthropic/claude-3-5-sonnet-20240620 ❌ 0.00s\nopenrouter/meta-llama/llama-3.1-405b-instruct ❌ 0.00s\n\nResults saved to .../gptme/eval_results/eval_results_20240909_202435.csv\n' = <Result okay>.output > > > > tests/test_eval.py:20: AssertionError > > ``` > >

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

ErikBjare / gptme

refactor: refactor tools, codeblock, and tooluse #113

TODO

:x: 1 Tests Failed: