papers - Githubissues

XuperX commented 1 year ago

[ ] LIMA: Less Is More for Alignment

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

XuperX commented 1 year ago

[ ] Unifying Large Language Models and Knowledge Graphs: A Roadmap

XuperX commented 1 year ago

[ ] An intrepid guide to ontologies

XuperX commented 1 year ago

[ ] Can large language models (LLMs) train themselves? The explosion of imitation-based open-source LLMs drew criticism due to cursory evaluation that covered up performance gaps. However, recent research shows powerful open-source LLMs can actually be created by imitating other models.

Background: Although many popular LLMs are proprietary, there is a burgeoning movement in the LLM community to create powerful open-source models. Many open-source models have been proposed, but this movement was especially catalyzed by the recent proposal of LLaMA. Following the release of LLaMA, the research community created numerous derivates of the model that are fine-tuned via an imitation approach. In other words, we take dialogue examples from proprietary LLMs (e.g., ChatGPT) and directly fine-tune the open-source LLM (i.e., LLaMA) on this data.

Poor Evaluation: Initially, the evaluation of these imitation models indicated that they came near the performance of top proprietary LLMs. If true, this finding would mean that creating open-source replicas of proprietary LLMs is easy and cheap. However, subsequent research concluded that the evaluation of such models was incomplete and misleading, finding that these models actually have large gaps in their comprehension. These gaps in performance were missed in human evaluations due to the ability of imitation models to replicate the style of LLMs like ChatGPT (but not their reasoning and knowledge base).

“Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.” - from Orca paper

Orca: Recently, however, the Orca publication (from Microsoft) showed us that imitation learning is not a dead end! Put simply, the problem with existing imitation models is that their fine-tuning dataset is too small/narrow and does not contain sufficient details from the “teacher” model. By creating a better imitation fine-tuning dataset, Orca is able to achieve performance comparable to ChatGPT across a wide scope of standardized benchmarks.

How do we do this? Prior imitation models are trained over pairs of prompts and associated responses generated from an LLM. This information is shallow and lacks details regarding why/how a response was produced. To generate more detailed data, Orca prompts the teacher model to output a detailed explanation along with its normal response. Drawing upon ideas like zero-shot CoT prompting, we can encourage the model to produce detailed explanations with each of its responses by just tweaking the system message. Then, we can fine-tune the imitation model using both the response and the explanation as a training signal.

TL;DR: Orca proposes a much larger (6M examples from ChatGPT and GPT-4) imitation fine-tuning dataset that is augmented with step-by-step problem-solving details for each of a model’s responses. By fine-tuning LLaMA over this data, we get an open-source LLM that actually comes near the performance of ChatGPT. With this in mind, imitation is a valid approach for creating useful open-source LLMs given a much better source of fine-tuning data.

https://twitter.com/cwolferesearch/status/1673398297304911872

XuperX commented 1 year ago