apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.45k stars 1.28k forks source link

VECTOR_SIMILARITY second operand must be a float array #12359

Closed hdulay closed 7 months ago

hdulay commented 8 months ago

I'm testing the Vector Similarity function using this statement but getting the corresponding error: For VECTOR_SIMILARITY predicate, the second operand must be a float array literal

select ProductId, UserId, l2_distance(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4]) as l2_dist, n_tokens, combined
from fineFoodReviews
where VECTOR_SIMILARITY(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4], 5)
-- order by l2_dist ASC
limit 5
ProcessingException(errorCode:150, message:SQLParsingError:
org.apache.pinot.sql.parsers.SqlCompilationException: For VECTOR_SIMILARITY predicate, the second operand must be a float array literal, got: Expression(type:FUNCTION, functionCall:Function(operator:VECTOR_SIMILARITY, operands:[Expression(type:LITERAL, literal:<Literal doubleArrayValue:[0.1, 0.1, 0.3, 0.4]>), Expression(type:LITERAL, literal:<Literal doubleArrayValue:[0.1, 0.1, 0.3, 0.4]>), Expression(type:LITERAL, literal:<Literal longValue:5>)]))
    at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.updateFunctionExpression(PredicateComparisonRewriter.java:139)
    at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.updatePredicate(PredicateComparisonRewriter.java:65)
    at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.rewrite(PredicateComparisonRewriter.java:40)
    at org.apache.pinot.sql.parsers.CalciteSqlParser.queryRewrite(CalciteSqlParser.java:569))

Workaround is to use CTE & multistage = true

with DIST as (
  SELECT 
    ProductId, 
    Summary, 
    Score,
    l2_distance(ARRAY[0.1,0.1,0.3,0.4],ARRAY[0.1,0.1,0.3,0.4]) AS l2_dist
  from fineFoodReviews
)
select * from DIST
where l2_dist < .6
order by l2_dist asc
hdulay commented 8 months ago

Here is the SQL executed in python that encountered this error.

SELECT 
  ProductId, 
  Summary, 
  Score,
  l2_distance(embedding, ARRAY{search_embedding}) AS l2_dist
from fineFoodReviews
where VECTOR_SIMILARITY(embedding, ARRAY{search_embedding}, 5)
order by l2_dist asc
xiangfu0 commented 8 months ago

Thanks for the reporting! There is a regression introduced by https://github.com/apache/pinot/pull/12118 The fix is here: https://github.com/apache/pinot/pull/12365