MARIO-Math-Reasoning / Super_MARIO

MIT License
228 stars 16 forks source link

Concern on (first few rounds) sampling efficacy #1

Closed billxbf closed 5 months ago

billxbf commented 5 months ago

This work is a beautiful. Great job! 🔥 I have a minor concern on MCTS evaluation. If a problem is very hard, do you observe that a pretrained LLM unable to sample any correct trajectory, ie. reaching any terminal state with a +1 reward? I feel that raising diversity by tweaking temperature or so might not be enough to guarantee a good candidate.

lovecambi commented 5 months ago

Yes, your concern is almost correct. In Fig3(a), we show the solving rate by MCTS for different difficulty levels. You can see at the first round, the success rate for problems at level 5 is significantly lower. For temperature, we use 0.6 (may be higher is better).

billxbf commented 5 months ago

Got it, so when the expanded tree of a source q doesn't reach any positive leaf x+, you just drop that case for that iteration?

lovecambi commented 5 months ago

Got it, so when the expanded tree of a source q doesn't reach any positive leaf x+, you just drop that case for that iteration?

One negative path could possibly be sampled and used for value loss.