Closed ZiminPark closed 3 months ago
We hypothesize that non-instruction fine-tuned language models learn to use these long contexts from similarlyformatted data that may occur in Internet text seen during pre-training, e.g., StackOverflow questions and answers.