epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
529 stars 76 forks source link

Support for Mistral #76

Closed philschmid closed 10 months ago

philschmid commented 11 months ago

Are you planning to add support for Mistral?

martinjaggi commented 11 months ago

we'd love to, but it requires a slight change (sliding window attention). we can have a look.

as it's a rather small model for now, not sure if we should prioritize mistral or falcon 180B first, what do you think?

philschmid commented 11 months ago

We are seeing a lot of interest from the HF community training Mistral, even if it is 7B. The question I guess is, would epfLLM improve fine-tuning/continuous pertaining of the model, e.g. be faster more efficient. If not the Falcon 180B is probably the right priority

malteos commented 11 months ago

+1

A potential starting point could be this Mistral implementation: https://github.com/PygmalionAI/aphrodite-engine/blob/12e296b55675d5784acb69d736189ae0a9ca40a8/aphrodite/modeling/models/mistral.py

xingyaoww commented 11 months ago

I tried to add a preliminary Mistral implementation here (https://github.com/epfLLM/Megatron-LLM/pull/88#issue-1988719134). It currently relies on the latest version of FlashAttention for Windowed Attention, although the window attention will only be used when seq len > 4096 which i currently don't have enough memory to test. Feel free to give it a try / test it!

martinjaggi commented 11 months ago

thank you so much, this looks great.

@AleHD , @kylematoba , @mkrima , @mpagli could one of you have a look at the PR #88 ?

kylematoba commented 10 months ago

taking a look

martinjaggi commented 10 months ago

closed by #88 #90