Open kgilpin opened 3 weeks ago
Title: Enhance Logging for 0-Scoring Instances
Problem:
Instances that score 0
indicate potential issues with the solver, such as improper handling of LLM output or unexpected behavior in the options passed to Navie. Identifying these instances can help diagnose and improve solver performance. The goal is to report these instances by logging them to the console and collecting them into a file dedicated to 0
-scoring instances.
Analysis:
To address this task, we need to implement a mechanism to track and report instances that result in a 0
score. The process should consist of two major enhancements:
0
-scoring instances.0
-scoring instances into a specific log file. This log file will then be marked as an output artifact in the build process, ensuring its availability for further analysis.The solution will involve:
0
scores.0
-scoring instances to a dedicated file.Proposed Changes:
solver/report.py:
0
.0
score to the console for immediate visibility during execution.0
-scoring instances are aggregated and written into a separate dedicated log file.solver/report.py:
Report
class, introduce a function dedicated to handling 0
-scoring instances. This function will append relevant data about these instances to a newly designated file, demarcating it as 0-scores.log
.run_evaluation.py:
0
-scoring instances and writes them to the specified file.modularize the logging setup:
0
-scoring results.By making these changes, the user will gain clear visibility of instances where the solver's performance might be deficient. Collecting this data will provide actionable insights to enhance the solver's robustness over time.
Instances that score
0
are a great place to look when figuring what is going wrong with a solver, or when looking for ways to make a solver better.For example,
0
-scoring examples will often show cases in which the LLM output is not being handled properly, or where the options passed to Navie are not behaving as expected.The solver should print
0
-scoring solve logs to the console, and also collect them into a dedicated file. This0
-scoring log file should be uploaded as an Action output as well.