Open KyTiXo opened 1 year ago
I started working on this project a little bit before the tcp_server PR was closed, and when it closed I was disappointed since I knew that at some point, the branch would fall behind, but I thought I'd might as well release what I have.
The PR author created his own fork which he maintained up until a few weeks ago. I was working on a new version of llama-playground using his fork, but it seems like there's no point continuing since he has stopped updating his fork.
When I have time, I might take a look at llama-rs pr #37 which implements an HTTP server into llama-rs. While llama-rs isn't llama-cpp, the project is heavily based on llama-cpp's code -- advertising itself as "a Rust port of the llama.cpp project." It also looks to be very actively maintained.
I saw the RS port as well but after reading the issues and discussions, it seems it needs many more optimizations in comparison to the C++ version.
I saw that a C API was implemented into llama.cpp now, so I was thinking about trying to make a Rust wrapper (instead of a port) but I'm not sure I know enough about it to pull it off.
Actually there is a GO wrapper that looks great and appears to use the C API with the latest version at build time.
https://github.com/go-skynet/go-llama.cpp
Maybe creating a TCP server within that may be a better plan.
That looks interesting. I'm not an expert at go, but looking at the example main.go
file, it doesn't seem like it would be too difficult to create a TCP server.
Might have a shot creating it sometime in the future.
As a sveltekit fan I found this project, so I wanted to say thank you! I'm curious...
What are your thoughts since the TCP branch seems to be falling behind main? Im not familiar with cpp enough to know how to do this so maybe there is something obvious I'm missing. The other binding libs don't seem to work as well as your implementation with the TCP server.
I feel like a ton of optimizations have been implemented into main the last 3 weeks.
Is there another API implementation that's as viable as the TCP server that doesn't require reloading the model each time but will easily integrate with T3 or SK?