Closed francis closed 11 months ago
Marqo is great for this. Can include a filter expression in the query to scope the search down to certain end users, for instance. On May 1, 2023, 4:18 PM -0600, Francis @.***>, wrote:
Vector search is good for capturing semantically similar texts, but other systems are adding the ability to store meta data with each vector that can be used in union with the vector search to find things. @jaigouk - since you wrote the vector store, is this concept there already, or can it be added with a little work? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
update: the file is visible with this PR https://github.com/BoxcarsAI/boxcars/pull/65
@francis something like this? https://github.com/jaigouk/boxcars/blob/feature/add-in-memory-vector-store/lib/boxcars/boxcar/vector_stores/document.rb
class Document
attr_accessor :page_content, :metadata
def initialize(fields = {})
@page_content = fields[:page_content] || ""
@metadata = fields[:metadata] || {}
end
here is a spec that is using that structure. not the metadata directly.
it would be nice to have a use case or examples for that.
and i need to clean up hnswlib search after in_memory is done.
Yes, I think this is on the right path @jaigouk
@obie - I will see what I can learn about Marqo. Thank you for the pointer.
@jaigouk - I pushed a small change to the main branch to move all calls to OpenAI::Client to one shared method in Boxcars::Openai and updated my sample notebook at https://github.com/BoxcarsAI/boxcars/blob/main/notebooks/Embeddings%20Search.ipynb with the new names.
I guess the next step would be to have Boxcars::VectorStores::SimilaritySearch return multiple results with a passed param and metadata about the results.
Something like:
similarity_search = Boxcars::VectorStores::SimilaritySearch.new params results = similarity_search.call query: "Am I provided a laptop?", count: 3
and this would return the three closest matches (preferably with additional metadata such as reference path/URL and other attributes a user wanted to store with a record, but minimally the search distance)
@francis yup.
I will continue https://github.com/BoxcarsAI/boxcars/issues/60 I also want to clean up hnswlib based codes.
Question. we just return metadata as it is and user will decide to use metadata for filtering the result further. right?
@jaigouk also, I closed the ticket already, but I changed VectorStores to VectorStore and moved as a top-level peer to a Boxcar, Train, and Engine. I hope that doesn't mess up your tree too much.
It looks like the VectorSearch PR might be some or all of this ticket. I think we want to filter results based on metadata and I can do this after the search. Is it possible to refine the search possibilities before searching?
vector search will return Document array based on distances. and the document instance has metadata method that returns the orignal hash if there were some. I am not injecting much info within metadata. the order of array is based on distance or similarity already. So I thought that it is upto users of this gem to "filter" the result further.
I thought about creating a demo app with hanami v2. and then even though I create a crawler, i may want to use the metadata for navigating the website structure. i guess we might need extra boxcar for crawlers. I might need to investigate for combining that with vector storage.
Vector search is good for capturing semantically similar texts, but other systems are adding the ability to store meta data with each vector that can be used in union with the vector search to find things.
@jaigouk - since you wrote the vector store, is this concept there already, or can it be added with a little work?