c0sogi / llama-api

An OpenAI-like LLaMA inference API
MIT License
111 stars 9 forks source link

Any way to define embeddings model in model_definitions.py? #19

Open morgendigital opened 11 months ago

morgendigital commented 11 months ago

First of all, thank you for creating llama-api, it really works great! Just wanted to ask: is there a possibility to add embeddings models as well to the model_definitions.py?

It seems that the automatic downloader sometimes gets corrupted or times out. I tried it with a smaller embeddings model and everything worked fine, it cached the model and embeddings work fine. But anything over roughly 100MB times out at some point, and I'm not sure why.

Alternatively, is there any way to manually put an embeddings model into the .cache folder? I'm not really sure about the structure here, it looks quite different than a regular model directory that I would download on my own.

Thank you!

PS: Happy to contribute a bit to the codebase if it is still actively maintained, as we will probably make some changes for better production-serving. Even if it's just the readme file to explain how to serve it in production over Ngnix with load balancing and multiple instances on one server.

morgendigital commented 11 months ago

I have solved it temporarily by replicating the model folder and file structure of the smaller embeddings model that was successfully loaded in cache, renaming relevant parts to their sha256 checksums, etc... was a bit dirty, but it worked :-)