Ability to load custom models in Fermyon Cloud Serverless AI

mikkelhegn commented 11 months ago

There has been requests for the ability to load ones own model for the Serverless AI feature. This could for instance be LoRA tuned version of Llama 2.

benwilde commented 11 months ago

Thanks @mikkelhegn - happy to provide more details / help with testing this when it's appropriate. It does seem like starting with allowing Llama variants might be a good first step in this direction!

tpmccallum commented 11 months ago

It is important to note that the GGUF model format has now superseded the GGML model format (which we currently exclusively support). A user who performs fine-tuning of models using currently available Python libraries (i.e. via Google Collab and posting to HuggingFace etc.) will most likely produce the GGUF model format in every training/tuning scenario. We need to consider not only supporting the old GGML format (which requires old llama-cpp-python version 0.1.49 and other caveats). We should consider supporting the newer GGUF model format as part of this feature request, as new custom models will likely be GGUF model compliant only. cc: @radu-matei @mikkelhegn @macolso @rylev

fermyon / feedback

Ability to load custom models in Fermyon Cloud Serverless AI #31