Closed kgilpin closed 3 weeks ago
Title: Report Test Path and Error for Failing or Erroneous Tests in Initial Enumeration
Problem: When executing the initial test selection phase, some test cases fail or encounter errors, as observed in the provided error log. During this process, the specific path of the test and the corresponding error should be reported to aid in diagnosing and resolving issues within the test harness. These failures or errors may lead to undetected or failed solutions, making it imperative to report the issues promptly.
Analysis:
In the example provided, a UnicodeEncodeError
occurs during database setup, revealing a problem in encoding certain Unicode characters. Errors like these during the initial test enumeration can halt the process, leading to missed solutions or incorrect assumptions about test success. To effectively diagnose these issues, the system should report the specific test path that caused the error and the associated error message. This information can inform debugging efforts and allow for quick resolution of underlying issues in the test harness or the test environment configuration.
Proposed Changes:
solver/solve_instance.py:
report_error
function to include additional information about the test path along with the error details.solver/workflow/summarize_test_errors.py:
summarize_test_errors
function to extract and return the test path alongside the error messages.swebench/harness/grading.py:
get_eval_report
or utility functions to return more detailed error information when failures occur.swebench/harness/run_evaluation.py:
UnicodeEncodeError
occurs, the system reports the full details, including the test path and error, to the calling functions.@dividedmind this test fails without any modification by us; any thoughts on why it's blowing up on this encoding issue?
Tests can fail because of things like specific environment setup that's needed; we do need to filter these out, as we do, but I don't think we should spend time to track down each one.
why it's blowing up on this encoding issue?
My guess is that it's due to the container environment being incomplete and not setting locale correctly. Setting PYTHONIOENCODING=utf-8
in the environment might help.
When the solver encounters a test case that fails or errors during initial test selection, report the test path and the error.
This can be used to identify problems with the test harness that result in missed / failed solutions.
An example:
https://github.com/getappmap/navie-benchmark/actions/runs/11544982873/job/32131260977#step:7:449