Possibility of offloading to Ollama based endpoint instead of OpenAI?

JSv4 / OpenContracts

Mass document analytics platform based on LlamaIndex, Pgvector, React and Django.

https://JSv4.github.io/OpenContracts/

Apache License 2.0

671 stars 53 forks source link

Possibility of offloading to Ollama based endpoint instead of OpenAI? #154

Closed jessestevens5b closed 2 months ago

jessestevens5b commented 2 months ago

I'm still pretty new to your project, but through digging around I can see various calls to OpenAI's API and in the config there's a section for the API key.

Is it possible to offload to an Ollama instance instead so that all data endpoints are locally based?

Our big focus is that any kind of processing like this for our sensitive documents needs to occur 100% locally.

I see that LlamaIndex supports Ollama as an endpoint, is that how OpenAI is being reached (through LlamaIndex?)

JSv4 commented 2 months ago

You want to look in the tasks module, specifically opencontractserver/tasks/data_extract_tasks.py. We are using LlamaIndex there, but you can write your extractors and use whatever Python code you want. It's probably easiest for starters to use my code and replace the OpenAI LLM with Ollama, which you can easily do with LlamaIndex. LlamaIndex also supports using HuggingFace inference endpoints, so you could host LLMs there too.

jessestevens5b commented 2 months ago

Could it be possible to have an if statement in there to test if settings.OLLAMA_MODEL etc are there and switch to ollama if they exist? That way we could change the endpoint by providing config for the base_url, the model, the request_timeout for Ollama.

I don't know deeply enough how this would affect everything yet, or how you are pulling your settings in, but this would be super useful for those in the situation we are in that we cannot offload to hardware owned by others

jessestevens5b commented 2 months ago

Some quick and dirty code using the LlamaIndex library that does the job with my local ollama endpoint:

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms import ollama

Settings.llm = ollama.Ollama(model='llama3', base_url='http://192.168.20.200:11434', request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name='multi-qa-MiniLM-L6-cos-v1', cache_folder="./models")

print("Loading documents...")
documents = SimpleDirectoryReader('./data').load_data()
print("Loaded", len(documents), "documents")

print("Indexing documents...")
index = VectorStoreIndex.from_documents(documents)
print("done")

print("Saving index...")
query_engine = index.as_query_engine()
print("Index saved")

print("Querying...")
response = query_engine.query(
    "Does this document mention article 690?"
)

print(response)

JSv4 commented 2 months ago

Awesome! If you wanted to open a PR to create a task that uses Ollama, that would be AWESOME :-). We'd need to run Ollama, probably in another container, so you'd need to update the compose stack too. Would definitely welcome the contribution (and would be happy to review / pair / consult). It's something I've wanted to do, I just don't have the time to do it all :-)

jessestevens5b commented 2 months ago

Ok I'll see if I can put some time into it. I'm not all that familiar with docker, so some of that aspect is a mystery to me.

Personally, I'd probably leave the Ollama installation as a separate object, as it changes often, and you'd want it to be a separate service rather than lumped in. Access is still via API so it's nice and simple.