Create a concise report of 0-scoring instances

Title: Enhance Logging for 0-Scoring Instances

Problem: Instances that score 0 indicate potential issues with the solver, such as improper handling of LLM output or unexpected behavior in the options passed to Navie. Identifying these instances can help diagnose and improve solver performance. The goal is to report these instances by logging them to the console and collecting them into a file dedicated to 0-scoring instances.

Analysis: To address this task, we need to implement a mechanism to track and report instances that result in a 0 score. The process should consist of two major enhancements:

Modifying the current logging functionality to include additional logging for 0-scoring instances.
Implementing a new feature to collect the details of 0-scoring instances into a specific log file. This log file will then be marked as an output artifact in the build process, ensuring its availability for further analysis.

The solution will involve:

Updating the logic that processes and evaluates instances to include an additional check for 0 scores.
Adding appropriate logging statements to capture the context of these instances.
Modifying the report generation or evaluation scripts to write 0-scoring instances to a dedicated file.

Proposed Changes:

solver/report.py:
- Modify the logic in the section that collects evaluation reports to include a conditional check for instances with a score of 0.
- Add logic to print instances with a 0 score to the console for immediate visibility during execution.
- Ensure that these 0-scoring instances are aggregated and written into a separate dedicated log file.
solver/report.py:
- Within the Report class, introduce a function dedicated to handling 0-scoring instances. This function will append relevant data about these instances to a newly designated file, demarcating it as 0-scores.log.
run_evaluation.py:
- After the evaluation script finishes and reports are gathered, invoke the logic that collects 0-scoring instances and writes them to the specified file.
- Ensure that this file is correctly uploaded as an Action output so that it is retained beyond the execution session for further analysis.
modularize the logging setup:
- Centralize the logging configurations to ensure consistency across various logging demands, especially when printing the 0-scoring results.
- Ensure that the logging setup is flexible enough to include potential future enhancements in logging detail level and output destinations.

By making these changes, the user will gain clear visibility of instances where the solver's performance might be deficient. Collecting this data will provide actionable insights to enhance the solver's robustness over time.

getappmap / navie-benchmark

Create a concise report of 0-scoring instances #84