Closed kevindharmawan closed 8 months ago
This looks good! BTW, have you investigated the score produced by match_bm25
? Are there still negative values and can we rank candidates based on that score?
have you investigated the score produced by match_bm25? Are there still negative values and can we rank candidates based on that score?
Unfortunately, there's still negative values and I don't think the score is reliable to be used for ranking.
This PR will:
fulltext_index_duckdb.py
'sfts_query
function.With
stopwords='english'
, DuckDB's FTS index will remove stopwords before storing the searchable data. But, when the client send a keyword to find, the stopwords in the keyword is not removed. For example, "University of Chicago" will be stored as "University Chicago" (the actual implementation can be a bit different), but querying "University of Chicago" will not be read as "University Chicago" and FTS will return empty result. On the contrary, withstopwords='none'
, "University of Chicago" will still be stored as "University of Chicago" and querying "University Chicago" will still find "University of Chicago".