Enhancements to Training Configurations and Script

Dear Authors of the rulm Project,

I want to express my profound gratitude for your exceptional work on the rulm project and I am excited to propose some enhancements that could further augment its capabilities.

New Configurations Added

self_instruct/configs/mistral_7b_128k.json
self_instruct/configs/mpt_30b.json
self_instruct/configs/mpt_7b_8k.json
self_instruct/configs/mpt_7b_storywriter.json

These configurations are designed to add support of more models.

Modifications to self_instruct/src/train.py

I have incorporated support for new options in the train.py script, which include:

tokenizer_name - this option allows the use of a tokenizer different from the model. For instance, while training MosaicML MPT models, the EleutherAI/gpt-neox-20b tokenizer is utilized, even though the model name might be something like mosaicml/mpt-7b-storywriter;
use_fast - as the tokenizer used in MPT models does not operate if use_fast=False, this addition ensures smoother functionality.
use_flash_attention_2 support for 4-bit training mode - this enhancement is introduced to optimize training efficiency in 4bit scenarios.

I believe these additions will offer more versatility to the rulm project. I eagerly await your feedback and ready to make any necessary adjustments based on your suggestions.

IlyaGusev / rulm

Enhancements to Training Configurations and Script #35