Open fire opened 10 months ago
@aiaimimi0920 You may be interested, I am not able to work on this much.
@fire Indeed, for the sake of my AI robot, I have attempted to implement a plugin using llama.cpp(https://github.com/ggerganov/llama.cpp).
But the problem with the local llama.cpp is:
So I ultimately turned to these two options
If you are still interested in using llama for deployment in Godot, I may be able to provide some assistance,
But I am currently trying to replace Godot's tts using (https://github.com/huakunyang/SummerTTS) Compiled into gdextension, I hope my mimi ai can speak emotional voices
After finishing this, I will come to help you
My plan before I decided to focus on V-Sekai was to evaluate performance on https://huggingface.co/TheBloke/Nous-Hermes-2-SOLAR-10.7B-GGUF
I haven't tested this model yet (https://huggingface.co/TheBloke/Nous-Hermes-2-SOLAR-10.7B-GGUF), it seems to be a new one,
You can try running this model using llama.cpp, Llama.cpp is easy to deploy and run
I believe the key reference criteria for using this model are:
I am happy to help you complete this project, as it is also what I need
https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md can be used to expose json grammar so like create an animation tree for an npc travel commands
With sufficiently long context like with RWKV, one can also use https://github.com/lucidrains/meshgpt-pytorch to generate godot engine resources and like packed scenes.
Combining mesh generation with gbnf it may even be always valid.
How about it? Do you have performance test reports or anything like that
I have roughly completed the compilation steps for tts (https://github.com/huakunyang/SummerTTS/issues/35),
and I should be able to help you soon
I tried using the solar model in lm studio and it was impressive until the context size was exceeded and it stopped working. I believe that is normal. There are evaluation metric charts in the hugging face but its not the same as using it. When gpu offloaded the performance had a instant feeling in lm studio which also uses llama.cpp. I believe rwkv is a llama.cpp fork for unlimited context
Amazing, I will also test it. If possible, let's bring it to Godot
I think the most interest use case is generating Godot Engine Resource ".tres" directly.
The tscn example doesn't work well, but the json one works.
Here's a photo.
So, is the current approach still to run this against a remote/hosted model? I was really interested in running models locally, sure they may be big and slowish at the moment but they will get smaller over time! Also curious -- does this not overlap with V-Sekai/iree.gd ?
It does overlap. Iree.gd is functional, this isn’t.
I'll probably archive godot-llama if there's no work done on it in the next weeks.
I revived godot-llama because of exceptional performance from PHI-3. https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF
Updated the llama-cpp library but ran out of time to update godot-llama. Looking for help.