Abraxas-365 / langchain-rust

🦜️🔗LangChain for Rust, the easiest way to write LLM-based programs in Rust
MIT License
634 stars 85 forks source link

pgvector vectorstore produces double-quoting for filters, breaking them #268

Open dredozubov opened 2 days ago

dredozubov commented 2 days ago

Describe the bug pgvector VectorStore implementation incorrectly filters the result, resulting in no matches. Current implementation double-quotes json string values instead of single-quoting them.

To Reproduce Steps to reproduce the behavior:

  1. Enable query logging in postgres:
    
    ALTER SYSTEM SET log_statement = 'all';
    ALTER SYSTEM SET log_duration = on;
    ALTER SYSTEM SET log_min_duration_statement = 0;

Reload configuration

SELECT pg_reload_conf();


2. Run similarity search via pgvector vectorstore
3. See that queries are rendered this way: `WHERE (data.cmetadata ->> 'doc_type') = '"earnings_transcript"'` instead of `WHERE (data.cmetadata ->> 'doc_type') = "earnings_transcript"`
4. Filtering won't return any result: 

db=# WITH filtered_embedding_dims AS MATERIALIZED ( SELECT FROM vs_embeddings WHERE vector_dims(embedding) = '1536' ) SELECT COUNT() FROM filtered_embedding_dims JOIN vs_collections ON filtered_embedding_dims.collection_id = vs_collections.uuid WHERE vs_collections.name = 'langchain' AND (filtered_embedding_dims.cmetadata ->> 'doc_type') = '"earnings_transcript"' ; count

 0

(1 row)


**Expected behavior**

db=# WITH filtered_embedding_dims AS MATERIALIZED ( SELECT FROM vs_embeddings WHERE vector_dims(embedding) = '1536' ) SELECT COUNT() FROM filtered_embedding_dims JOIN vs_collections ON filtered_embedding_dims.collection_id = vs_collections.uuid WHERE vs_collections.name = 'langchain' AND (filtered_embedding_dims.cmetadata ->> 'doc_type') = 'earnings_transcript' ; count

26

(1 row)



**Desktop (please complete the following information):**
 - OS: OS X Ventura 13.6.9
 - Version: langchain-rust = { version = "4.6.0", features = ["postgres"] }
dredozubov commented 2 days ago

Made a quick fix there: #269