bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
374 stars 49 forks source link

add file level FIM and sanity check #81

Closed loubnabnl closed 1 year ago

loubnabnl commented 1 year ago

This copies @RaymondLi0 's implementation of File level FIM to Megatron

       --fim-rate 0.5 \
       --fim-split-sample \"<file_sep>\" \
       --fragment-fim-rate 0.5 \

I added a dataloader sanity check flag/code from brrr, I can remove it if it's not needed