ShenghaiWang / SwiftLlama

A Swift Wrapper for llama.cpp
MIT License
28 stars 6 forks source link

Stopping the Model #15

Open will-lumley opened 1 month ago

will-lumley commented 1 month ago

Is it possible to stop the model? We have start(), but we don’t have a stop() equivalent.

In certain scenarios, it would be useful to have the ability to gracefully stop or terminate a running model inference process, especially when it’s being used in environments where resource management is crucial.

A stop() function could help with: • Freeing up resources like memory or compute when the model is no longer needed. • Handling cases where the inference is taking too long and needs to be interrupted. • Ensuring that models can be started and stopped dynamically without having to reinitialise and reinitialise the whole model object.

Is this something that could be added, or is there already a workaround for this use case?

Thanks!

ShenghaiWang commented 1 month ago

The SwiftLlama is very lightweight, you can free the model related resources by freeing up SwiftLlama object instance. If stop() frees up the memory that models used, the system has to reload models before calling it again and reinitialise would be a must.

I understand you might be thinking freeing up resources partially, I didn't dig into the llama.cpp code for this yet. I am also not sure if partially freeing up memory is meaningful or not as this type of system is usually designed to run with exclusive resources.

Regarding stopping long run cases, the maxTokenCount parameter in configuration is used for this purpose.