Closed chimezie closed 11 months ago
FYI. I was able to implement disjunctive semantics for comparing against an array of text (i.e., the query_tags argument w/ conjunctive = False) via this syntax:
WHERE 'tag 1' = ANY (tags) OR
'tag 2' = ANY (tags) OR
'tag 3' = ANY (tags)
This example is for 3 tags ('tag 1', 'tag 2', and 'tag 3'), but it works for an arbitrary number of tags where each comparison is separated by OR
@chimezie just checking for context. Osi pushed a test case in #57 this morning which already handles disjunctive semantics (he says—I haven't had a chance to review yet). Is your comment pursuant to seeing his changes, and are you proposing something different? I just asked him & he was also unclear, so I thought I'd check.
Ahh.. The comment was without having seen that implementation. I'll swap it in.
Ahh.. The comment was without having seen that implementation. I'll swap it in.
Cool! Yep, Osi says the PR is definitely ready for our review, so if you have a moment, pls go ahead. 😊
Ahh.. The comment was without having seen that implementation. I'll swap it in.
Cool! Yep, Osi says the PR is definitely ready for our review, so if you have a moment, pls go ahead. 😊
Ok. I just pushed a test case for the disjunctive/OR semantics that is failing (provides more than one tag for disjunctive evaluation).
This gave me a chance to review the pgvector test setup, so that's good.
Disjunctive eval is working fine. The reason why your added test isn't working, @chimezie is that you didn't specify limit
, and the default limit for the search
method is 1
. After the following, test passes:
- sim_search = await vDB.search(query_string='Text', query_tags=['tag1', 'tag3'], conjunctive=False)
+ sim_search = await vDB.search(query_string='Text', query_tags=['tag1', 'tag3'], conjunctive=False, limit=1000)
So the question I have is why we're setting a default limit of 1. @choccccy do you happen to know? Any reason not to make the default unlimited?
Disjunctive eval is working fine. The reason why your added test isn't working, @chimezie is that you didn't specify
limit
, and the default limit for thesearch
method is1
. After the following, test passes:
Got it. Sounds good
So the question I have is why we're setting a default limit of 1. @choccccy do you happen to know? Any reason not to make the default unlimited?
I set a one limit since unlimited might return massive lists of records, so one seemed to be the least arbitrary number to limit that possibility. I believe that the fetched list of records is now returned as a generator, so upping that limit (or completely unlimiting it) should be fine now.
tag-filtered search should be correctly implemented as of the merge from Pgvector bigdata insert and search
and released in version 0.7.0
So the question I have is why we're setting a default limit of 1. @choccccy do you happen to know? Any reason not to make the default unlimited?
I set a one limit since unlimited might return massive lists of records, so one seemed to be the least arbitrary number to limit that possibility. I believe that the fetched list of records is now returned as a generator, so upping that limit (or completely unlimiting it) should be fine now.
Yes. I think we dovetailed in the end.
The query_tags keyword argument to the .search([..]) function isn't intuitive to use. It is an array of TEXT fields. However, if you try:
await pacerDB.search(".. some search string ..", query_tags='some_tag'
)`You get:
Looking at the table structure and the way the SELECT query WHERE clause is put together, it looks like this keyword argument is directly used as a boolean-evaluating expression, which is not obvious from the documentation and seems unsafe to be directly executed in the SQL query. The documentation of that parameter suggests it should just be a list of tags and the function should construct the appropriate WHERE clause, using any of the filter expressions described here, and taking the opportunity to safely incorporate it into the conjunctive expression in the executed SQL query.
So, I would expect to be able to do something like
await pacerDB.search(".. some search string ..", query_tags=['tag1', 'tag2', 'tagN']
)`