Open Palmik opened 1 year ago
We have create a script to convert models trained with QLoRA to CTranslate2 to speed up inference here https://github.com/Actable-AI/llm-utils/blob/main/qlora2ct2/convert_qlora2_ct2.py
Any plan to support loras directly? Would be great to switch between loras :)
Big fan of CT2 here as well, changing LorA would allow for the following use case: Coding model loaded (i.e. top Wizard Coder) In the chat interface we check the intent of the message - if it's not related to code generation itself -> Load LorA and run the prompt. Using fine-tuned coding models for other purposes completely breaks their coding abilities and the above approach would allow to create a really good internal, universal LLM for developers.
Any plan to support Lora weights directly without needing of merging?
Context: With HF models, one can use peft to do parameter efficient tuning, the most popular (and afaik most performant) method being LoRa.
Idea: It would be great to be able to have an instance (in GPU memory) of a base HF transformer model (running with CT2) that you run with multiple instances of of LoRa weights.
Would be curious to hear if you think this could be done in CT2 in a generic way that's applicable to all HF transformer models (just like HF's peft).