Open kgilpin opened 2 months ago
Title: Improve Propagation of Test Failure Information to Code Generator
Problem: Currently, the code generator's iterative process lacks effective propagation of critical information from previous attempts, which could guide subsequent code generation attempts. Specifically, when test cases fail, the associated failure messages are not effectively utilized, leading to repeated mistakes in code generation. This can cause stagnation in solving problems, as exemplified by the recurring errors in patch attempts.
Analysis: Improving the propagation of information from test failures could enhance the learning process of the code generator, allowing it to adapt its approach based on past attempts. The code generator should be aware of which patches have been previously tried and why they failed. In particular, test failure messages should be captured and used to direct the LLM to avoid similar errors. Additionally, understanding if a test expected to fail passed can provide insights into the logic errors of the generated code.
The main components that need consideration for capture and propagation include:
Proposed Changes:
generate_patch
to include keeping a history of attempted patches alongside their outcomes.TestStatus.ERROR
occurs but also to gather and store specific test failure messages. TestStatus.PASSED
or FAILED
specifically for pass_to_fail scenarios to record these outcomes in detail for feedback into the generator.GenerateTest
to accommodate the history of patch attempts and to utilize the test failure messages directly for avoiding redundant errors.generate
function integrates test script error information to guide changes towards successful patch results, adjusting iterative logic strategies based on prior failures explicitly documented.By implementing these changes, the generator will better leverage historical data from prior attempts, reducing the likelihood of iterating over the same mistakes and increasing the chances of generating successful patches quickly.
The
generate_patch
method collects test errors when a test status isERROR
. These errors are then passed in to subsequent iterations of the code generator, to advise the LLM about how to avoid the errors.Consider collecting additional information and propagating it to future solve attempts:
Here's an example in which the LLM just keeps making the same mistake, because we don't propagate the test errors:
scikit-learn__scikit-learn-13124.zip