Significant-Gravitas / Auto-GPT-Benchmarks

A repo built for the purpose of benchmarking the performance of agents, regardless of how they are set up and how they work.
MIT License
273 stars 77 forks source link

Fix "code.py" conflict with Python's code module, and fix TestReturnCode_Simple conflict between two test.py files. #321

Closed lc0rp closed 1 year ago

lc0rp commented 1 year ago

1 - Rename "code.py" to "sample_code.py" to prevent occasional conflict with Python's code.py module (https://docs.python.org/3/library/code.html). When this conflict occurs, the agent shows errors like "cannot import multiply_int"

2 - Fix TestReturnCode_Simple. There are three "test.py" files: => the one in artifacts_in, which the agent should manipulate => the one in artifacts_out, which is for --mock => the one in custom_python => This one is used to test that everything worked.

The issue is that the third one (custom_python/test.py) will overwrite the first one (artifacts_in/test.py) when both are copied to the workspace. This causes all tests to fail because the output of the original test.py is different from the one for custom_python/test.py (the LLM doesn't write the extra tests because it wasn't asked to do so)

I fixed it by renaming as follows:

And changing data.json to reference testfile.py instead of test.py

PR Quality Checklist