EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
4.4k stars 307 forks source link

Support for T5 Architecture #384

Open niranjanakella opened 5 months ago

niranjanakella commented 5 months ago

Hello @EricLBuehler, opening this issue as part of T5 Seq2Seq model architecture support in mistral.rs. (As discussed)

Relates to: #156

EricLBuehler commented 5 months ago

Hi @niranjanakella!

Thank you for opening this issue. Just to clarify, would this be a quantized or nonquantized implementation?

niranjanakella commented 5 months ago

@EricLBuehler Non-Quantized f16,32 implementation currently holds more precedence. But if possible, would also like to have a quantized implementation too.

Also I wish to know if LoRA adapters can be loaded at runtime without merging them into the model. It would be a huge game changer for most applications given the fact that many developers train multiple adapters. Would be great to attach multiple adapters during runtime.

EricLBuehler commented 5 months ago

Non-Quantized f16,32 implementation currently holds more precedence. But if possible, would also like to have a quantized implementation too.

Sounds great, I'll get started on an implementation.

Also I wish to know if LoRA adapters can be loaded at runtime without merging them into the model. It would be a huge game changer for most applications given the fact that many developers train multiple adapters. Would be great to attach multiple adapters during runtime.

We actually have this feature already! There are 2 ways to do this: 1) Activate adapters at runtime by preloading some and then sending requests to activate adapters 2) Use per-request adapter specification to have granular control.

Docs: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/ADAPTER_MODELS.md#adapter-model-dynamic-adapter-activation.

EricLBuehler commented 4 months ago

Hi @niranjanakella! Sorry for the delay; I have been busy with the Idefics 2 implementation (#309). I should have a prototype ready tonight, though!

niranjanakella commented 4 months ago

@EricLBuehler No problem sounds good. I am looking forward to trying it out soon.

EricLBuehler commented 4 months ago

See: #432.

cyanic-selkie commented 3 months ago

Hi, is there any news for this? Is the PR in a usable state? I have the exact same use case, albeit with a quantized model.