argonne-lcf / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
7 stars 8 forks source link

Merge in `tokenizer-tests` branch into `main` #17

Closed saforem2 closed 3 months ago

saforem2 commented 3 months ago

Explicitly, this PR gives the ability to dynamically switch between different tokenizers at runtime[^flash-attn-sunspot].

Supported tokenizers:

[^flash-attn-sunspot]: This is in an effort to better understand the behavior of flash-attn on Sunspot. For additional details see: 📸 flash-attn on Sunspot