When large language models are aligned via supervised fine-tuning, they mayencounter new factual information that was not acquired through pre-training.It is often conjectured that this can teach the model the behavior ofhallucinating factually incorrect responses, as the model is trained togenerate facts that are not grounded in its pre-existing knowledge. In thiswork, we study the impact of such exposure to new knowledge on the capabilityof the fine-tuned model to utilize its pre-existing knowledge. To this end, wedesign a controlled setup, focused on closed-book QA, where we vary theproportion of the fine-tuning examples that introduce new knowledge. Wedemonstrate that large language models struggle to acquire new factualknowledge through fine-tuning, as fine-tuning examples that introduce newknowledge are learned significantly slower than those consistent with themodel's knowledge. However, we also find that as the examples with newknowledge are eventually learned, they linearly increase the model's tendencyto hallucinate. Taken together, our results highlight the risk in introducingnew factual knowledge through fine-tuning, and support the view that largelanguage models mostly acquire factual knowledge through pre-training, whereasfine-tuning teaches them to use it more efficiently.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)