brainlid / langchain

Elixir implementation of an AI focused LangChain style framework.
https://hexdocs.pm/langchain/
Other
653 stars 70 forks source link

Vector store (pgvector/pinecone) support? #8

Open 29decibel opened 1 year ago

29decibel commented 1 year ago

Thank you @brainlid for starting this project! Played with the two live notebooks, works very well! Simple and very clean, well designed interfaces. 👍

I am wondering what's the roadmap moving forward, especially around vector store support.

Would very much love to migrate my NodeJS langchain projects to Phoenix/Elixir.

Thanks again for the effort! Can't wait to write more using it ❤️

brainlid commented 1 year ago

Hi @29decibel! I'd love to have support for Vector DBs and document searching using those vectors.

I don't personally have a need for those at the moment, so I don't plan to implement it myself.

My current focus is:

That's my short list. I'd love contributions! I'm happy to talk through API design for the features as well. :slightly_smiling_face:

amokan commented 1 year ago

@29decibel I have quite a bit of production experience with pgvector in context of Elixir so just tossing my two cents in as I think this is a great conversation to get started.

I think it would be great to get an initial implementation going to create vectors and maybe work with them in something like ETS for the sake of Livebook or ephemeral scenarios - but I'm a bit on the fence when it comes to integrating directly with something like pgvector in the context of Ecto, as that logic likely lives in the project using :langchain as a dependency, right? I say that because outside of very simple scenarios, you probably wouldn't want to leave your text splitting strategy up to a library and you may be doing a lot of other text preprocessing outside the scope of this. But first to admit I could be thinking about this topic wrong (or maybe I'm just unlucky due to the data I work with).

Worth noting that Scholar has distance calculations covered. While not a solution for thousands of embeddings, I think there is merit in considering something like that for a first pass to support vector distance scenarios without reaching for a full DB 🤷

Final question for you is if you have any examples out there (aside from the canonical langchain.js and python examples) that leverages vector search and does not use that logic in the core application codebase and instead relies on the implementation direct from langchain?

brainlid commented 1 year ago

@amokan I am only passively interested in supporting pgvector. I think its cool and I would like to have it, but I don't have any personal experience with it and it's currently not on my plan to implement.

I would love help in this area.

brainlid commented 1 year ago

For document access, a draft PR is being worked on: https://github.com/brainlid/langchain/pull/3

amokan commented 1 year ago

@brainlid Haven't forgotten about this topic. Been thinking about some common interfaces in this area to make any effort on this front flexible.

I am currently tinkering with an ETS-based 'MemoryStore' in context of LLM chains for some other efforts and figure something similar may be a good first step in here. Basically using ETS as a context window and allowing vector support/distance.

If that is something that is of interest for this project, I can try to piece together a PR over the next week or two.

brainlid commented 1 year ago

@amokan Cool! I really don't know enough about this area to know if an ETS-based memory store makes sense. In principle, I'm not opposed to using ETS tables in this way with the library.