Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.68k stars 170 forks source link

MLM pretraining objective #89

Closed wj210 closed 10 months ago

wj210 commented 10 months ago

I understand that only casual masking is supported. Would it take much modification to allow masked language modelling objective in the pretraining phrase? I assume most work is needed in the type of models used under the LLM folder? Thanks!

ChrisLiu6 commented 10 months ago

We've never attempted something similar before. As you pointed out, to enable MLM you'll need to adjust the model definition to: 1) mask some of the input tokens, and 2) alter the attention mask. Additionally, you'll need to change the following code to update the ground truth for loss calculation:

https://github.com/Alpha-VLLM/LLaMA2-Accessory/blob/bc4c8db3086fa795fac1af22185847c17e26b1cd/accessory/model/meta.py#L86

You'll also have to tailor the dataset implementation to suit your specific needs. That's about all I can suggest for your requirements.