Open sangheonEN opened 1 month ago
I have many projects integrating RealtimeSTT with LLMs you could have a look at. For example, here is a simple one from the RealtimeSTT test folder. This one here has a more sophisticated LLM integration. And this here, my main project, is a personal assistant which also includes function calling.
Thank you very much for your answer.
In addition, I have a question about combining STT and LLM!
I am using faster whisper Korean model, and I want to use huggingface's llama3-ko model (https://huggingface.co/gagagiga/Llama-3-Open-Ko-8B-Q4_K_M-GGUF/tree/main) for LLM Korean model.
Then, I will use the open source here https://github.com/KoljaB/LocalEmotionalAIVoiceChat, modify the config below to use ollama model,
elif config.llm_provider == "ollama": from llm_ollama.llm_handler import LLMHandler
Is it okay to run the LLM model on the server with the ollama run model_name command and then run the main code?
Is there anything else I need to do?
And can I refer to that code for a simple service on a web server?
https://github.com/KoljaB/RealtimeSTT/tree/master/example_webserver
And is it available only window os?
If the https://github.com/KoljaB/RealtimeSTT/tree/master/server code works on Ubuntu,
I am using Ubuntu with wsl2 on one PC, and I can use two docker containers to run the WEB server code (https://github.com/KoljaB/RealtimeSTT/tree/master/server) on one docker and make the other docker act as a client.
And if you install pip install RealtimeSTT in the anaconda virtual environment, the audio_recorder.py file will be created in the path "C:\Users\USER\anaconda3\envs\realtimestt_env\Lib\site-packages\RealtimeSTT\audio_recorder.py"
The audio_recorder.py file will be created in that path.
Then the problem is that whenever you update the source code, I want to merge the updated code with the previous version of the source code and my code that I am writing by referring to it, but do I have to clone the valley git and keep adding and removing only the modified parts?
Is there no way to easily update this work?
In other words, I was developing with your previous version of the code, and when the version is updated, I want to merge the updated version of the code at once.
There are four questions in total, so please answer them.
Is it okay to run the LLM model on the server with the ollama run model_name command and then run the main code?
That is correct. You first need to execute "ollama run model_name".
Is there anything else I need to do?
Yes. You need to open llm_ollama/completion_params.json and put the model_name as "model" parameter there.
And can I refer to that code for a simple service on a web server?
Sure you can. I would recommend this more modern webserver code though. Btw creating a web server that performs STT and TTS and that also scales well horizontally is far from easy.
And is it available only window os?
All basic techniques work in Ubuntu too, RealtimeSTT as well as RealtimeTTS. Maybe you want to switch from ollama to another LLM provider there, I think vllm would be the fastest.
but do I have to clone the valley git and keep adding and removing only the modified parts? Is there no way to easily update this work?
To merge your custom changes with updates to the RealtimeSTT package, fork the repository, make your changes locally, and track updates from the original source. When updates are available, merge them with your custom code. This way git helps automate the merging of updates with your custom changes, so you don’t need to manually copy and paste code. It handles most of the work, and you only resolve conflicts if necessary.
I want to integrate REALTIMESTT and LLM to more accurately identify the intent of the inferred TEXT and define accurate events with the identified intent.
For example, when controlling a robot in real time with voice recognition, the STT result must be very accurate to define an event with the inferred TEXT through STT. However, if LLM can be added to it, the speaker's intent can be identified robustly even with slightly less accurate TEXT inference. Are you working on a project that takes this into consideration?
In other words, I want to work on a project on REALTIME STT + LLM, and I would like to ask for some advice.
Can you recommend a reference or methodology that I can try?