alibaba / havenask

Apache License 2.0
1.6k stars 302 forks source link

ha3 能否支持类似于es should 打分的功能 #216

Closed arscarrot closed 1 year ago

arscarrot commented 1 year ago

比如现在有个array字段attr_catid,对含有10085,10086的doc进行打分,我发现只能只用NUMBER索引 query类似于: select constant_score('attr_catid', '10086^20,10085^10') where contain(xxx, 'xxx') andor contain(attr_catid, '10085|10086') constant_score('attr_catid', '10086^20,10085^10') 是udf, contain(xxx, 'xxx')是匹配条件,contain(attr_catid, '10085|10086')用于拿到对应的term不用于匹配过滤,但是现在也没有A ANROR B 这种语法,也就是A必须匹配,B可匹配也可不匹配,匹配拿到的doc中attr_catid 字段中如果有10085和10086就加分,没有就不加分,这也是es里面常见的用法,想问下内部有没有相关场景?等待各位大佬的回复

dyuyang commented 1 year ago

补充一下:实际上就是实现基于类似es的Rank feature query的功能,如果某个字段等于多少,就额外得分 https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-rank-feature-query.html

这个判断可以放在udf内部遍历,但如果这种条件字段较多,可能性能较低

xuxijie commented 1 year ago

使用RANK是否可以满足需求?A RANK B,A必须匹配,B可以匹配也可以不匹配

arscarrot commented 1 year ago

使用RANK是否可以满足需求?A RANK B,A必须匹配,B可以匹配也可以不匹配

请问下 RANK 是 contain(xxx, 'xxx') RANK contain(attr_catid, '10085|10086')是这么用的吗?我试了下会报错 failed to get sql plan, error message is : [IQUAN_EC_INTERNAL_ERROR] internal error : org.apache.calcite.sql.parser.SqlParseException: Encountered "RANK" at line 1, column 102. Was expecting one of:

"EXCEPT" ... "FETCH" ... "FILTER" ... "GROUP" ... "HAVING" ... "INTERSECT" ... "LIMIT" ... "OFFSET" ... "ORDER" ... "OVER" ... "MINUS" ... "UNION" ... "WINDOW" ... "WITHIN" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "ILIKE" ... "RLIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "IGNORE" ... "RESPECT" ...
arscarrot commented 1 year ago

使用RANK 解决了这个问题,参考 https://havenask.net/#/doc/sql/query_grammar/custom_function/udf/intro#QUERY