Feedback on the "SelfCheck" Paper: Considerations Regarding the Result Integration

Mrw33554432 commented 1 year ago

Firstly, I'd like to commend the authors on the comprehensive methodology presented in the paper. I've taken the time to thoroughly understand the approach, and while the majority of the content is clear, I have some reservations about the "RESULTS INTEGRATION" section.

Alternative Approaches to Result Integration:
- Iterative Regeneration: One potential enhancement could be to iteratively regenerate steps until a consensus is reached, i.e., until the LLM consistently produces a step that is supported. This might offer a more robust verification mechanism.
- Voting Mechanism: Another straightforward approach could be to allow the LLM to vote between the original and the regenerated step. Given the LLM's capabilities, this could serve as an effective method to determine the correctness of each step.
Concerns with Current Approach:
- In the context of solving mathematical problems, a single incorrect step can often invalidate the entire solution. Given this, I wonder if it's beneficial to proceed with subsequent steps once an error is detected. It might be more efficient to focus on rectifying the erroneous step before moving forward. (Because your formula will take those wrong subsequent steps into consideration)
Potential Enhancement: Tree-Structured Agent:
- Building on the current methodology, there seems to be an opportunity to develop an agent that can solve mathematical problems with high accuracy. By running the step check on each step and using one of the aforementioned methods to validate each step, we could construct a tree-like structure. Each node in this tree would represent a step, and each step would be validated by multiple attempts. This would ensure that the entire solution process is supported and verified at each stage.

I believe these suggestions could further refine the approach and enhance the robustness of the methodology. I hope this feedback is constructive and aids in the evolution of this research.

Mrw33554432 commented 1 year ago

Oh, and an observation-based suggestion. If you used the method I mentioned above and the model gives three different answers at a specific step, most likely all results are wrong because it implies the model cannot handle that question.

As for the code in run.py, I would recommend using more classes and methods in it as it is way too long and too hard to read. Writing the prompts in a different place will also help a lot.

NingMiao commented 1 year ago

Thanks @Mrw33554432 for your suggestions! Iterative/tree structure generation and voting by LLM are very brilliant ideas on how to use results from step self-checking to improve LLM reasoning.

The reason for not rejecting a answer once an error is detected is sometimes the checker also makes mistakes. Especially for long reasoning with more than 10 steps, it's likely that a correct step is erroneously recognized as incorrect. If we simply reject all answers where an error is detected, we may lose a lot of diversity when doing voting.

Thank you also for your comment on run.py. I will clean it up soon.

Cheers, Ning

NingMiao / SelfCheck

Feedback on the "SelfCheck" Paper: Considerations Regarding the Result Integration #1