MLM pretraining objective

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Other

2.68k stars 170 forks source link

We've never attempted something similar before. As you pointed out, to enable MLM you'll need to adjust the model definition to: 1) mask some of the input tokens, and 2) alter the attention mask. Additionally, you'll need to change the following code to update the ground truth for loss calculation:

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/bc4c8db3086fa795fac1af22185847c17e26b1cd/accessory/model/meta.py#L86

You'll also have to tailor the dataset implementation to suit your specific needs. That's about all I can suggest for your requirements.

Alpha-VLLM / LLaMA2-Accessory

MLM pretraining objective #89