SciSharp / LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
https://scisharp.github.io/LLamaSharp
MIT License
2.48k stars 331 forks source link

Create HTTP API server and provide API like OAI #269

Open xbotter opened 10 months ago

xbotter commented 10 months ago

like https://github.com/ggerganov/llama.cpp/tree/master/examples/server

AsakusaRinne commented 10 months ago

It's a good idea, but I'm not sure if this is expected to be completed in this project. I have no idea about how to make a high-performance server now. Maybe @Oceania2018 and @saddam213 have some ideas about it? They are the authors for BotSharp and LLamaStack, respectively.

Oceania2018 commented 10 months ago

We can add a simple HTTP OpenAPI. If you need a more complete conversation service experience, you can use BotSharp to manage conversations and LLamaSharp as the LLM Provider.

AsakusaRinne commented 10 months ago

I think this issue could be separated into two parts. One is the improvement of text-embedding generation mentioned in #239 . The other is how to support using multi-models and multi-context and switch between them with high performance. Not sure how to achieve the second point yet.

saddam213 commented 10 months ago

The LLamaStack WebApi supports most of LLamaSharp features including state saving etc https://api.llama-stack.com/swagger/index.html

Adding a new controller to map the different json inputs outputs should be trivial, assuming there is nothing out of the ordinary in there.

However LLamaStack uses custom executors and state management, so that could cause issues for people looking to use vanilla LLamaSharp.

So maybe best option is to make a version of the WebAPI for LLamaSharp, and add controllers for normal implementations and OAI implementations

  1. Modify Web Example to be both Web and WebApi example
  2. Modify existing WebApi to add the new Api

Option 2 would require a new project, a project to share commonalities, like the ModelService as that supports multi-models and multi-context

And if we go for option 2 we could even port the WPF app over

xbotter commented 10 months ago

The current WebApi can only be used as a sample project for testing. I plan to move it to the Examples project. Then, create a new LLamaSharp.WebApi class project that provides webapi similar to OAI, serving as an endpoint for asp.net core, making it easy and fast to integrate into webapi projects. However, memory management for model loading is indeed a significant issue.

gianni-rg commented 10 months ago

Hello, I'd like to share with you a .NET 8 Minimal API demo project, to locally host LlamaSharp supported models behind an OpenAI-compatible API. It's in very early stages, but it should work as a starting point. Feel free to get inspiration from it. It has been implemented to test open-source coding assistants in VSCode, like continue.dev.

https://github.com/gianni-rg/LlamaSharpApiServer

martindevans commented 10 months ago

Since there are already two external projects that do this (and there are probably more out there), perhaps we should not include it in this project and instead just update the readme to link to projects like LlamaStack and LlamaSharpApiServer?

sangyuxiaowu commented 3 months ago

Now, there are three. https://github.com/sangyuxiaowu/LLamaWorker

AsakusaRinne commented 3 months ago

@sangyuxiaowu That's great! If you would like to let more people know your work, feel free to open a PR to add this link to readme.

Coopaguard commented 2 months ago

Hi every one, I cloud have an basic/simple API for you:

Image

Image

This API manage multiple users & can be locked with bearer if you are interested let me know