foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
9 stars 30 forks source link

feat: support custom tokenizers #190

Open kmehant opened 2 weeks ago

kmehant commented 2 weeks ago

At this point the tokenizer_name_or_path argument is part of the peft config which should not be the case since it can be used irrespective of using peft config like in the case of providing a custom tokenizer. Given this, we should have a separate field to provide custom tokenizer path may be part of the model args dataclass will allow users to use custom tokenizer which can be passed while creating the tokenizer instance.

If not provided this can be initialized to model_name_or_path when not provided which can be handled in the __post_init__ lifecycle of the dataclass.