benchmark crush - Githubissues

BigArty commented 1 year ago

I was testing some new local llms when this error happened. ────────────────────────────────────────────── benchmark/exercism-python ─────────────────────────────────────────────── test-cases: 111 model: gpt-3.5-turbo edit_format: whole commit_hash: bcf4a55-dirty num_error_outputs: 4 num_user_asks: 4 num_exhausted_context_windows 0 test_timeouts: 2

15.3% correct after try 0 25.2% correct after try 1

duration: 35.3 sec/test-case costs: $0.0058/test-case, $0.65 total, $0.78 projected ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── main_model: gpt-3.5-turbo edit_format: whole fnames: benchmark/exercism-python/simple-cipher/simple_cipher.py Model: gpt-3.5-turbo Git repo: none Repo-map: disabled Added simple_cipher.py to the chat. Traceback (most recent call last): File "benchmark/benchmark.py", line 696, in app() File "/usr/local/lib/python3.8/site-packages/typer/main.py", line 328, in call raise e File "/usr/local/lib/python3.8/site-packages/typer/main.py", line 311, in call return get_command(self)(*args, kwargs) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.8/site-packages/typer/core.py", line 716, in main return _main( File "/usr/local/lib/python3.8/site-packages/typer/core.py", line 216, in _main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/usr/local/lib/python3.8/site-packages/typer/main.py", line 683, in wrapper return callback(**use_params) # type: ignore File "benchmark/benchmark.py", line 325, in main results = run_test( File "benchmark/benchmark.py", line 578, in run_test coder.run(with_message=instructions) File "/usr/local/lib/python3.8/site-packages/aider/coders/base_coder.py", line 342, in run new_user_message = self.send_new_user_message(new_user_message) File "/usr/local/lib/python3.8/site-packages/aider/coders/base_coder.py", line 456, in send_new_user_message interrupted = self.send(messages, functions=self.functions) File "/usr/local/lib/python3.8/site-packages/aider/coders/base_coder.py", line 582, in send self.show_send_output(completion) File "/usr/local/lib/python3.8/site-packages/aider/coders/base_coder.py", line 635, in show_send_output show_resp = self.render_incremental_response(True) File "/usr/local/lib/python3.8/site-packages/aider/coders/wholefile_coder.py", line 25, in render_incremental_response return self.update_files(mode="diff") File "/usr/local/lib/python3.8/site-packages/aider/coders/wholefile_coder.py", line 52, in update_files output += self.do_live_diff(full_path, new_lines, True) File "/usr/local/lib/python3.8/site-packages/aider/coders/wholefile_coder.py", line 125, in do_live_diff if full_path.exists(): File "/usr/local/lib/python3.8/pathlib.py", line 1407, in exists self.stat() File "/usr/local/lib/python3.8/pathlib.py", line 1198, in stat return self._accessor.stat(self) OSError: [Errno 36] File name too long: '/aider/benchmark/exercism-python/simple-cipher/3. Modify the encode() function to take a key argument (defaulting to None). If no key is provided, call the generate_key() function to generate a new key. Use the key to substitute each letter in the plaintext with its corresponding letter in the key.'

paul-gauthier commented 1 year ago

Thanks for trying aider and reporting this issue.

This looks to me like perhaps the local LLM produced a badly formed edit response, which triggered an error in aider. This is probably because the LLM you were using isn't able to follow instructions well enough to work properly with aider.

If you can isolate the LLM response that is causing the error, we might be able to debug it further.

BigArty commented 1 year ago

Sadly, (or fortunately) this problem only appeared once for me. I made test with several models and configurations, but this particular run aborted ad the very end and with the one of the strongest models in my tests.

paul-gauthier commented 1 year ago

Ok, please let me know if you are able to reproduce. I'm going to close this issue for now, but feel free to re-open or file a new issue any time.

Aider-AI / aider

benchmark crush #238