Garbage output of Llama-2-13B-chat model after qlora finetuning

I have finetuned the Llama-2-13B-chat model using lora for a document summarization task. The original text is much longer than the model's context length of 4k. I segmented the text into multiple segments with each less than 3K tokens. I performed model inference after model finetuning (adapter was merged to the base model). There are garbage outputs in some segments such as duplicated (similar) sentences or paragraphs. There are some strange patterns too such as 2 or 3 words repeated sequentially before a full stop. Any idea or thought?

artidoro / qlora

Garbage output of Llama-2-13B-chat model after qlora finetuning #274