I want to express my profound gratitude for your exceptional work on the rulm project and I am excited to propose some enhancements that could further augment its capabilities.
New Configurations Added
self_instruct/configs/mistral_7b_128k.json
self_instruct/configs/mpt_30b.json
self_instruct/configs/mpt_7b_8k.json
self_instruct/configs/mpt_7b_storywriter.json
These configurations are designed to add support of more models.
Modifications to self_instruct/src/train.py
I have incorporated support for new options in the train.py script, which include:
tokenizer_name - this option allows the use of a tokenizer different from the model. For instance, while training MosaicML MPT models, the EleutherAI/gpt-neox-20b tokenizer is utilized, even though the model name might be something like mosaicml/mpt-7b-storywriter;
use_fast - as the tokenizer used in MPT models does not operate if use_fast=False, this addition ensures smoother functionality.
use_flash_attention_2 support for 4-bit training mode - this enhancement is introduced to optimize training efficiency in 4bit scenarios.
I believe these additions will offer more versatility to the rulm project. I eagerly await your feedback and ready to make any necessary adjustments based on your suggestions.
Dear Authors of the rulm Project,
I want to express my profound gratitude for your exceptional work on the rulm project and I am excited to propose some enhancements that could further augment its capabilities.
New Configurations Added
These configurations are designed to add support of more models.
Modifications to self_instruct/src/train.py
I have incorporated support for new options in the train.py script, which include:
tokenizer_name
- this option allows the use of a tokenizer different from the model. For instance, while training MosaicML MPT models, the EleutherAI/gpt-neox-20b tokenizer is utilized, even though the model name might be something like mosaicml/mpt-7b-storywriter;use_fast
- as the tokenizer used in MPT models does not operate if use_fast=False, this addition ensures smoother functionality.use_flash_attention_2
support for 4-bit training mode - this enhancement is introduced to optimize training efficiency in 4bit scenarios.I believe these additions will offer more versatility to the rulm project. I eagerly await your feedback and ready to make any necessary adjustments based on your suggestions.