Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
Apache License 2.0
5.97k stars 518 forks source link

To mask or not to mask? #290

Open Dundalia opened 1 year ago

Dundalia commented 1 year ago

I am not understanding the conceptual usefulness of masking out the prompt. I have seen that there is a comment in scripts/prepare_alpaca.py that says:

mask_inputs: bool = False, # as in alpaca-lora

Is it recommended when fine-tuning with lora? Is there some benefit elsewhere?

sanjarbek16 commented 1 year ago

It depends on your specific use case. I noticed that it is better to mask when your dataset is long chat dialogues. When not masked, model responses become quite repetitive (repeating sentences from the context).

timothylimyl commented 1 year ago

Intuitively, if your task is QA, it makes sense to mask out the context and question so that the model focuses on learning how to answer.

That's how I did it over at my qa repo

aabhasgupta commented 3 months ago

I am facing a similar dilemma. If I want to fine tune a model for summarization and I mask the chat I want to summarize, doesnt that hinder the model's ability to learn from the input or capture nuanced relationships between the summary and the input chat?