Open Mrw33554432 opened 1 year ago
Oh, and an observation-based suggestion. If you used the method I mentioned above and the model gives three different answers at a specific step, most likely all results are wrong because it implies the model cannot handle that question.
As for the code in run.py, I would recommend using more classes and methods in it as it is way too long and too hard to read. Writing the prompts in a different place will also help a lot.
Thanks @Mrw33554432 for your suggestions! Iterative/tree structure generation and voting by LLM are very brilliant ideas on how to use results from step self-checking to improve LLM reasoning.
The reason for not rejecting a answer once an error is detected is sometimes the checker also makes mistakes. Especially for long reasoning with more than 10 steps, it's likely that a correct step is erroneously recognized as incorrect. If we simply reject all answers where an error is detected, we may lose a lot of diversity when doing voting.
Thank you also for your comment on run.py. I will clean it up soon.
Cheers, Ning
Firstly, I'd like to commend the authors on the comprehensive methodology presented in the paper. I've taken the time to thoroughly understand the approach, and while the majority of the content is clear, I have some reservations about the "RESULTS INTEGRATION" section.
Alternative Approaches to Result Integration:
Concerns with Current Approach:
Potential Enhancement: Tree-Structured Agent:
I believe these suggestions could further refine the approach and enhance the robustness of the methodology. I hope this feedback is constructive and aids in the evolution of this research.