Closed jaigouk closed 1 year ago
I want to add integration specs and confirm the search. after this, i wish to add pgvector or redis embeddings with extra PRs later. so i wish to clean the up embeddings directory.
ready for review @francis when you have time, please take a look.
This has the same problem as the other PR that you fixed. Can you put that work here to support the other rubies so that these tests pass, please?
@jaigouk - did you get a chance to add the other rubies support?
Also, I'm trying to come up with a nice demo of this ability. Maybe something along the lines of this - https://github.com/hwchase17/notion-qa?
I will rebase and check
try to resolve https://github.com/BoxcarsAI/boxcars/issues/13 with Hnswlib
# The Notion_DB data is from https://github.com/hwchase17/notion-qa
require 'dotenv/load'
lib_path = File.expand_path('../../lib', __dir__)
$LOAD_PATH.unshift(lib_path) unless $LOAD_PATH.include?(lib_path)
require 'boxcars'
store = Boxcars::Embeddings::Hnswlib::BuildVectorStore.call(
training_data_path: './Notion_DB/**/*.md',
index_file_path: './hnswlib_notion_db_index.bin',
force_rebuild: false
)
openai_client = OpenAI::Client.new(access_token: ENV.fetch('OPENAI_API_KEY', nil))
similarity_search = Boxcars::Embeddings::SimilaritySearch.new(
embeddings: "#{File.dirname('./')}/hnswlib_notion_db_index.json",
vector_store: store[:vector_store],
openai_connection: openai_client
)
returns
{
:document=>
"we provide you with a laptop that suits your job. Ask HR for further info.\n
- **Workplace**: \nwe've built a pretty nice office to make sure you like being at Blendle HQ. Feel free to sit where you want. Even better: dare to switch your workplace every once in a while.\n\n# Work at Blendle\n\n
---\n\nIf you want to work at Blendle you can check our [job ads here](https://blendle.homerun.co/).
If you want to be kept in the loop about Blendle, you can sign up for [our behind the scenes newsletter](https://blendle.homerun.co/yes-keep-me-posted/tr/apply?token=8092d4128c306003d97dd3821bad06f2)."
, :distance=>120
}
I guess we can refine the interface further with other PRs later. for pgvector, I found https://github.com/ankane/neighbor and trying to make it working for rom.rb https://github.com/jaigouk/rom-neighbor
if people dont't want to call openai for query vector, then we need to use a package like https://github.com/facebookresearch/fastText but that is a python package. I don't know there is a ruby gem for that.
hey folks, just wanted to point you in the direction of Marqo, which I've been using in my own prototypes as a vector DB with built-in generation of embedding and ability to run easily as docker instance. https://www.marqo.ai/
there are multiple options for embeddings but most of them are paid versions. and recommended requirement for milvus is 32GB ram for standalone version. I wanted to start with hnswlib.
changes
why?