Filimoa / open-parse

Improved file parsing for LLM’s
https://filimoa.github.io/open-parse/
MIT License
2.54k stars 99 forks source link

support embeddings via ollama #21

Closed miku closed 7 months ago

miku commented 7 months ago

Note: This maybe made obsolete by #8.

This add semantic_transforms.OllamaEmbeddings, which allows to calculate embeddings locally using ollama (https://ollama.com/), following the api from OpenAIEmbeddings. Currently, ollama does not support batching (but it is on their roadmap, cf. https://ollama.com/blog/embedding-models).

The LocalSemanticIngestionPipeline shows how it can be used.

To test locally, install ollama, then pull an embeddings model, such as https://ollama.com/library/mxbai-embed-large, then:

from openparse import processing, DocumentParser
semantic_pipeline = processing.LocalSemanticIngestionPipeline(
    url="http://localhost:11434",
    model="mxbai-embed-large",
)
parser = DocumentParser(
        processing_pipeline=semantic_pipeline,
)
parsed = parser.parse("path/to/file.pdf")
Filimoa commented 7 months ago

Thanks for taking the time to create this!

We'll be integrating embedding modules in the next few days which should enable people to use a ton of different embedding providers in a single interface (choosing which ones they install).

You can track progress in PR #23

miku commented 7 months ago

Thanks for your work on open-parse - closing this in favor of #23.

Bruce337f commented 6 months ago

Any updates here please?