Open xbotter opened 1 year ago
It's a good idea, but I'm not sure if this is expected to be completed in this project. I have no idea about how to make a high-performance server now. Maybe @Oceania2018 and @saddam213 have some ideas about it? They are the authors for BotSharp and LLamaStack, respectively.
We can add a simple HTTP OpenAPI. If you need a more complete conversation service experience, you can use BotSharp to manage conversations and LLamaSharp as the LLM Provider.
I think this issue could be separated into two parts. One is the improvement of text-embedding generation mentioned in #239 . The other is how to support using multi-models and multi-context and switch between them with high performance. Not sure how to achieve the second point yet.
The LLamaStack WebApi supports most of LLamaSharp features including state saving etc https://api.llama-stack.com/swagger/index.html
Adding a new controller to map the different json inputs outputs should be trivial, assuming there is nothing out of the ordinary in there.
However LLamaStack uses custom executors and state management, so that could cause issues for people looking to use vanilla LLamaSharp.
So maybe best option is to make a version of the WebAPI for LLamaSharp, and add controllers for normal implementations and OAI implementations
Option 2 would require a new project, a project to share commonalities, like the ModelService as that supports multi-models and multi-context
And if we go for option 2 we could even port the WPF app over
The current WebApi can only be used as a sample project for testing. I plan to move it to the Examples project. Then, create a new LLamaSharp.WebApi class project that provides webapi similar to OAI, serving as an endpoint for asp.net core, making it easy and fast to integrate into webapi projects. However, memory management for model loading is indeed a significant issue.
Hello, I'd like to share with you a .NET 8 Minimal API demo project, to locally host LlamaSharp supported models behind an OpenAI-compatible API. It's in very early stages, but it should work as a starting point. Feel free to get inspiration from it. It has been implemented to test open-source coding assistants in VSCode, like continue.dev.
Since there are already two external projects that do this (and there are probably more out there), perhaps we should not include it in this project and instead just update the readme to link to projects like LlamaStack and LlamaSharpApiServer?
Now, there are three. https://github.com/sangyuxiaowu/LLamaWorker
@sangyuxiaowu That's great! If you would like to let more people know your work, feel free to open a PR to add this link to readme.
Hi every one, I cloud have an basic/simple API for you:
This API manage multiple users & can be locked with bearer if you are interested let me know
like https://github.com/ggerganov/llama.cpp/tree/master/examples/server