Closed RebelOfDeath closed 4 months ago
As I am extending LangChain
's BaseLLM
with better vllm
integration, you may use the BaseLLM
methods (invoke
/ainvoke
in this case). I will soon add the respective input/output parsers to ensure the format follows model specifications.
Done. We generate completions using a LangChain Runnable
, so this is the interface if you must use one. In practice, this corresponds to the chain
defined in server.completions.__init__
.
The chain is set up in the FastAPI lifespan
, which is passed to the app
. This means that all models will be loaded before endpoints are set up.
Provide an interface to the application layer to be able to invoke a method of sort that generates a completion for a given request and returns it in the intended format to the application.