Closed billxbf closed 5 months ago
Yes, your concern is almost correct. In Fig3(a), we show the solving rate by MCTS for different difficulty levels. You can see at the first round, the success rate for problems at level 5 is significantly lower. For temperature, we use 0.6 (may be higher is better).
Got it, so when the expanded tree of a source q
doesn't reach any positive leaf x+
, you just drop that case for that iteration?
Got it, so when the expanded tree of a source
q
doesn't reach any positive leafx+
, you just drop that case for that iteration?
One negative path could possibly be sampled and used for value loss.
This work is a beautiful. Great job! 🔥 I have a minor concern on MCTS evaluation. If a problem is very hard, do you observe that a pretrained LLM unable to sample any correct trajectory, ie. reaching any terminal state with a +1 reward? I feel that raising diversity by tweaking temperature or so might not be enough to guarantee a good candidate.