About the data format in different rounds

annisamansa commented 18 hours ago

Thank you for your excellent work. Recently, we are trying to reproduce this work. We are wondering what factors are considered for round 1 to use the few-shot form of Thought/Action/Action Input/Observation instead of directly using the few-shot form that is closer to the final training data ()?

Is it because the former is more conducive to model learning?

Round2 uses the data generated by round1 for training. How is this format() learned?

lovecambi commented 8 hours ago

You are almost correct.

For round 1, we use the REACT format, because of the legacy issue that it is popular for tool use. In addition, we think it is very likely that the recent pretraining data may include such format data, and the instruction following with REACT seems easier for base LLMs.

For other rounds, we change to the HTML format. The reason is as you said, the HTML format performs better than REACT in SFT. We hypothesize that the pretraining corpus should include large scale HTML data scraping from the website. For example, the tags <p></p> and <code></code> should have high frequency in pretraining data. The SFT only makes the LLM to learn some new tags such as <step></step>.

annisamansa commented 55 minutes ago

Thanks for your reply. By the way, have you ever tried to use data in HTML format in round 1?

MARIO-Math-Reasoning / Super_MARIO

About the data format in different rounds #26