Open will-lumley opened 1 month ago
The SwiftLlama is very lightweight, you can free the model related resources by freeing up SwiftLlama
object instance. If stop() frees up the memory that models used, the system has to reload models before calling it again and reinitialise
would be a must.
I understand you might be thinking freeing up resources partially, I didn't dig into the llama.cpp code for this yet. I am also not sure if partially freeing up memory is meaningful or not as this type of system is usually designed to run with exclusive resources.
Regarding stopping long run cases, the maxTokenCount
parameter in configuration is used for this purpose.
Is it possible to stop the model? We have start(), but we don’t have a stop() equivalent.
In certain scenarios, it would be useful to have the ability to gracefully stop or terminate a running model inference process, especially when it’s being used in environments where resource management is crucial.
A stop() function could help with: • Freeing up resources like memory or compute when the model is no longer needed. • Handling cases where the inference is taking too long and needs to be interrupted. • Ensuring that models can be started and stopped dynamically without having to reinitialise and reinitialise the whole model object.
Is this something that could be added, or is there already a workaround for this use case?
Thanks!