Closed RomanKoshkin closed 1 month ago
It is not a direct implementation of a paper but I took the inspiration from Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning. The technique in that paper is for training and not inference but in this library I implemented it during inference.
It is also the case with Prover-Verifier Games improve legibility of LLM outputs, that is also a technique for training but in this library it is implemented during inference.
Could you tell me which paper(s) is this code https://github.com/codelion/optillm/blob/main/optillm/mcts.py based on? Thanks!