Pull missing model for Ollama LLM generation

erhant commented 3 months ago

Is your feature request related to a problem? Please describe.

While using Ollama for LLM generation, if the model does not exist locally it will give an error saying that the model does not exist.

For example, the llm_ollama.rs example used llama2 (https://github.com/Abraxas-365/langchain-rust/blob/main/examples/llm_ollama.rs#L15) and if run the example you will get:

OpenAIError(ApiError(ApiError { message: "model 'llama2' not found, try pulling it first", type: Some("api_error"), param: None, code: None }))

Describe the solution you'd like I would perhaps suggest that we provide a simple wrapper for Ollama in particular? We already have an OllamaEmbedder struct that kind of handles the client setup. We could have a more fine-grained setup where the embedder and LLM models are passed in by themselves.

For embedders, we dont need to pull explicitly as Ollama does it on their own (https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embeddings)
For generation, we can always pull the model before generation is invoked. If the model does not exist, it will take some time to download it of course; if it exists it will take a few seconds to load it into memory. Then, generation will not give an error due to non-existing model.

Describe alternatives you've considered We are using LangChain and Ollama-rs together right now as a workaround in our project https://github.com/andthattoo/dkn-search-node/, using the latter to pull the model.

prabirshrestha commented 3 months ago

I'm open for having Ollama for LLM generation. Currently ollama doesn't support open ai compatible api endpoints hence we have OllamaEmbedder. Even if it supports we might want it as there are other features such as keep-alive that isn't supported by open-ai. As long as we have flag for disabling auto pull and loading I'm ok with having this. Let us now if you are interested in a PR.

erhant commented 3 months ago

Sure I would like to give it a go! I feel like we will be repeating many of the setups that https://github.com/pepperoni21/ollama-rs already does, especially related to things like keep-alive and many other generation request settings.

It could perhaps be a feature-gated setting where if feature ollama is enabled, ollama-rs is used for all these tasks, embeddings as well.

prabirshrestha commented 3 months ago

Seems like ollama-rs might support function calling soon based on https://github.com/pepperoni21/ollama-rs/issues/50#issuecomment-2117950797. It will also be good to reuse it instead of create one.

@erhant would you like to contribute to Ollama llm?

erhant commented 3 months ago

Seems like ollama-rs might support function calling soon based on pepperoni21/ollama-rs#50 (comment). It will also be good to reuse it instead of create one.

@erhant would you like to contribute to Ollama llm?

https://github.com/Abraxas-365/langchain-rust/issues/149#issuecomment-2118278063

Abraxas-365 / langchain-rust

Pull missing model for Ollama LLM generation #148