Open ericcurtin opened 1 week ago
This PR changes the default runtime from 'llama.cpp' to 'llama-cpp-python' and adds support for the 'llama-cpp-python' server implementation. The changes involve modifying the server execution logic and updating the CLI configuration to accommodate the new runtime option.
sequenceDiagram
participant User
participant CLI
participant Model
User->>CLI: Run command with --runtime flag
CLI->>Model: Pass runtime argument
alt Runtime is vllm
Model->>CLI: Execute vllm server
else Runtime is llama.cpp
Model->>CLI: Execute llama-server
else Runtime is llama-cpp-python
Model->>CLI: Execute llama_cpp.server
end
classDiagram
class CLI {
-runtime: String
+configure_arguments(parser)
}
class Model {
+serve(args)
}
CLI --> Model: uses
note for CLI "Updated default runtime to 'llama-cpp-python' and added it as a choice"
Change | Details | Files |
---|---|---|
Added llama-cpp-python as a new runtime option and made it the default |
|
ramalama/cli.py |
Implemented server execution logic for llama-cpp-python runtime |
|
ramalama/model.py |
@cooktheryan @lsm5 @mrunalp @slp @rhatdan @tarilabs @umohnani8 @ygalblum PTAL
@ygalblum we probably need to push some container images before merging this, but when we do that, we should be all good.
Changed default runtime from 'llama.cpp' to 'llama-cpp-python'. Added 'llama-cpp-python' as a runtime option for better flexibility with the
--runtime
flag.Summary by Sourcery
Add 'llama-cpp-python' as a new runtime option and set it as the default runtime, enhancing flexibility in model serving.
New Features:
Enhancements: