I successfully run llamafile models in a server, but is dependent on the SSH session I'm running it on. I'd like to know if the models could be executed as background processes, and in such case what would be a good option to do it. I browsed the issues and docs but could find an option for it.
The goal is to have an LLM service running in a server and consume it via REST API, but I'm afraid the project might not be intended for that purpose.
There are many ways to do this. You could use utilities like screen, tmux, or run the command as normal adding an & at the end of the command ./llamafile <params> &
Hello, and thanks for this awesome project.
I successfully run llamafile models in a server, but is dependent on the SSH session I'm running it on. I'd like to know if the models could be executed as background processes, and in such case what would be a good option to do it. I browsed the issues and docs but could find an option for it. The goal is to have an LLM service running in a server and consume it via REST API, but I'm afraid the project might not be intended for that purpose.
Hope you can help me, thanks!