ha3 能否支持类似于es should 打分的功能

arscarrot commented 1 year ago

比如现在有个array字段attr_catid，对含有10085，10086的doc进行打分，我发现只能只用NUMBER索引 query类似于： select constant_score('attr_catid', '10086^20,10085^10') where contain(xxx, 'xxx') andor contain(attr_catid, '10085|10086') constant_score('attr_catid', '10086^20,10085^10') 是udf, contain(xxx, 'xxx')是匹配条件，contain(attr_catid, '10085|10086')用于拿到对应的term不用于匹配过滤，但是现在也没有A ANROR B 这种语法，也就是A必须匹配，B可匹配也可不匹配，匹配拿到的doc中attr_catid 字段中如果有10085和10086就加分，没有就不加分，这也是es里面常见的用法，想问下内部有没有相关场景？等待各位大佬的回复

dyuyang commented 1 year ago

补充一下：实际上就是实现基于类似es的Rank feature query的功能，如果某个字段等于多少，就额外得分 https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-rank-feature-query.html

这个判断可以放在udf内部遍历，但如果这种条件字段较多，可能性能较低

xuxijie commented 1 year ago

使用RANK是否可以满足需求？A RANK B，A必须匹配，B可以匹配也可以不匹配

arscarrot commented 1 year ago

使用RANK是否可以满足需求？A RANK B，A必须匹配，B可以匹配也可以不匹配

请问下 RANK 是 contain(xxx, 'xxx') RANK contain(attr_catid, '10085|10086')是这么用的吗？我试了下会报错 failed to get sql plan, error message is : [IQUAN_EC_INTERNAL_ERROR] internal error : org.apache.calcite.sql.parser.SqlParseException: Encountered "RANK" at line 1, column 102. Was expecting one of:

"EXCEPT" ... "FETCH" ... "FILTER" ... "GROUP" ... "HAVING" ... "INTERSECT" ... "LIMIT" ... "OFFSET" ... "ORDER" ... "OVER" ... "MINUS" ... "UNION" ... "WINDOW" ... "WITHIN" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "ILIKE" ... "RLIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "IGNORE" ... "RESPECT" ...

arscarrot commented 1 year ago

使用RANK 解决了这个问题，参考 https://havenask.net/#/doc/sql/query_grammar/custom_function/udf/intro#QUERY

alibaba / havenask

ha3 能否支持类似于es should 打分的功能 #216