XuperX / TheCollector

https://xuperx.github.io/TheCollector/
MIT License
0 stars 0 forks source link

papers #15

Open XuperX opened 1 year ago

XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago

Background: Although many popular LLMs are proprietary, there is a burgeoning movement in the LLM community to create powerful open-source models. Many open-source models have been proposed, but this movement was especially catalyzed by the recent proposal of LLaMA. Following the release of LLaMA, the research community created numerous derivates of the model that are fine-tuned via an imitation approach. In other words, we take dialogue examples from proprietary LLMs (e.g., ChatGPT) and directly fine-tune the open-source LLM (i.e., LLaMA) on this data.

Poor Evaluation: Initially, the evaluation of these imitation models indicated that they came near the performance of top proprietary LLMs. If true, this finding would mean that creating open-source replicas of proprietary LLMs is easy and cheap. However, subsequent research concluded that the evaluation of such models was incomplete and misleading, finding that these models actually have large gaps in their comprehension. These gaps in performance were missed in human evaluations due to the ability of imitation models to replicate the style of LLMs like ChatGPT (but not their reasoning and knowledge base).

“Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.” - from Orca paper

Orca: Recently, however, the Orca publication (from Microsoft) showed us that imitation learning is not a dead end! Put simply, the problem with existing imitation models is that their fine-tuning dataset is too small/narrow and does not contain sufficient details from the “teacher” model. By creating a better imitation fine-tuning dataset, Orca is able to achieve performance comparable to ChatGPT across a wide scope of standardized benchmarks.

How do we do this? Prior imitation models are trained over pairs of prompts and associated responses generated from an LLM. This information is shallow and lacks details regarding why/how a response was produced. To generate more detailed data, Orca prompts the teacher model to output a detailed explanation along with its normal response. Drawing upon ideas like zero-shot CoT prompting, we can encourage the model to produce detailed explanations with each of its responses by just tweaking the system message. Then, we can fine-tune the imitation model using both the response and the explanation as a training signal.

TL;DR: Orca proposes a much larger (6M examples from ChatGPT and GPT-4) imitation fine-tuning dataset that is augmented with step-by-step problem-solving details for each of a model’s responses. By fine-tuning LLaMA over this data, we get an open-source LLM that actually comes near the performance of ChatGPT. With this in mind, imitation is a valid approach for creating useful open-source LLMs given a much better source of fine-tuning data.

https://twitter.com/cwolferesearch/status/1673398297304911872

XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago
XuperX commented 1 year ago