facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.4k stars 3.55k forks source link

Cannot debug similarity search #3378

Closed ssdidis closed 3 months ago

ssdidis commented 4 months ago

am trying to build a similarity search in python, cannot debug the function:

def perform_similarity_search(query_text, index, embeddings, top_k=5): """Perform similarity search in the FAISS index for a given query text."""

Use the embeddings object to embed the query_text into a vector.

# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
    query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

def run_indexing_pipeline(): documents = fetch_documents(documents_dir) text_chunks = divide_documents_into_text_chunks(documents) embeddings_model = prepare_embeddings() faiss_index = build_and_store_faiss_index(text_chunks, embeddings_model, faiss_db_path)

# Example query for testing purposes
query = "Enter some example text here"
distances, indices = perform_similarity_search(query, faiss_index, embeddings_model)
print("Distances:", distances)
print("Indices:", indices)

def perform_similarity_search(query_text, index, embeddings, top_k=5): """Perform similarity search in the FAISS index for a given query text."""

Use the embeddings object to embed the query_text into a vector.

# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
    query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

ERRORS: Traceback (most recent call last): File "/home/ubuntu/new_d.py", line 61, in run_indexing_pipeline() File "/home/ubuntu/new_d.py", line 56, in run_indexing_pipeline distances, indices = perform_similarity_search(query, faiss_index, embeddings_model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/new_d.py", line 36, in perform_similarity_search query_vector = embeddings.encode([query_text])

This is error being shown, pls let me know how I can correct it

mlomeli1 commented 4 months ago

it looks like in this pipeline, the function build_and_store_faiss_index() is a wrapper that calls the faiss library. However, the rest of the functions are either user-defined or come from some other library - can't really tell because your code is not reproducible. Your error says you have a problem in embeddings.encode([query_text]) which is probably not using faiss since the core faiss does not support embedding text @ssdidis so this is out of scope for us.