Open dustinbyrne opened 2 months ago
Correct malformed test_errors
in generate_code
calls
The function generate_code
is being called with malformed test_errors
, causing the <test-errors>
section in the generated input text (generate.input.txt
) to be filled with incoherent characters and symbols.
The issue lies in the way test_errors
is being passed around and utilized in the code base. The malformed output suggests that either:
test_errors
is garbled when being generated or formatted, ortest_errors
are collected or aggregated from different sources before being passed to generate_code
.Given that the trajectories confirm the test errors returned from summarize_test_errors
are valid, the problem likely exists somewhere after the summarize_test_errors
function is called and before the generate_code
function is invoked with these errors.
solver/workflow/summarize_test_errors.py: Ensure summarize_test_errors
is returning a properly formatted list of error strings by logging its output for verification.
solver/workflow/generate_and_validate_code.py: Modify the code that collects test_errors
to ensure it aggregates the errors correctly. Add logging to verify the content of the test_errors
set before passing it to generate_code
.
solver/workflow/solve_code.py: Ensure that the test_errors
parameter passed to generate_code
is properly formatted. Add verification to check the integrity and format of test_errors
.
Here are the suggested changes for each file component:
solver/workflow/summarize_test_errors.py:
summarize_test_errors
function to ensure it returns a correctly formatted string.solver/workflow/generate_and_validate_code.py:
generate_and_validate_code
function, ensure test_errors
are aggregated correctly.test_errors
before they are passed to the generate_code
.Here is the section to focus on from:
accumulator.extend(results)
return accumulator
test_errors = set()
def collect_errors(work_dir: WorkDir, run_test_result: RunTestResult):
if run_test_result.test_status == TestStatus.ERROR:
if run_test_result.test_output:
test_errors.update(
summarize_test_errors(work_dir, run_test_result.test_output)
)
solver/workflow/solve_code.py:
generate_code
method to ensure test_errors
is properly formatted and evaluated before being utilized.test_errors
list is well-formed before it is used in generating the code.Example section of code:
def generate_code(
self, work_dir: WorkDir, plan: str, test_errors: List[str]
) -> Optional[Patch]:
self.clean_git_state()
generator = GenerateCode(
self.log,
work_dir,
self.trajectory_file,
plan,
self.python_version,
self.limits.file_limit,
)
def generate(attempt, lint_errors: List[str]):
code = generator.generate(attempt, lint_errors, test_errors)
# Add logging to inspect test_errors here
self.log("debug", f"Test errors passed to code generation: {test_errors}")
return generator.apply(attempt, code)
By verifying and correcting the test_errors
at these points, you should be able to ensure that the generate_code
calls receive properly formatted error messages, resulting in correctly generated input text.
I haven't been able to reproduce this :-(
To find examples of this issue, look for instances of
<test-errors>
withingenerate.input.txt
logs. It would appear thatgenerate
is called with malformedtest_errors
.The trajectories confirm the test errors returned from
summarize_test_errors
are valid.E.g.: