what is the training target and how to calculate the loss in pretraining and finetuning.

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

GNU General Public License v3.0

5.73k stars 373 forks source link

FYI, with detailed documentation and tidied codes, LLaMA2-Accessory may be a better choice for you to customize LLM training.
Generally speaking, the training target for LLM, either for pre-training or fine-tuning, and either for single-modal or multi-modal, is next-token prediction. Given a sequence of word tokens, the model predicts each token, given all tokens before that (which is achieved through causal attention)
Therefore, if you want to train on your own dataset, the simplest way is to just prepare your data with the same format as ours, and use these data for training. In most cases, you don't need to change the loss function.

OpenGVLab / LLaMA-Adapter