Open rui-mo opened 1 month ago
Yeah leafCallToSubfieldFilter
is Presto specific (similar to Tokenizer
). Mixing in Spark names probably is not a good way to do it. A related question is how do we parse this expression in ExprCompiler
, do we have a isnotnull
function in sparksql
?
One way to fix it is to have
class ExprToSubfieldFilterParser {
std::unique_ptr<common::Filter> leafCallToSubfieldFilter(
const core::CallTypedExpr&,
common::Subfield&,
core::ExpressionEvaluator*,
bool negated = false);
};
And have this stored as shared_ptr
inside ExecCtx
, or another context object with query scope.
@Yuhta Thanks for your suggestion. I took further look and these are my findings.
To pass the instance of 'ExprToSubfieldFilterParser' through ctx, the workflow would be:
Another option I can think of is to provide a parser factory which allows the registration of customized filter parser. This way would be lighter than the first one as it does not bring API change. What are your thoughts? Thanks!
Yes we can use a global factory for now until someone wants to have multiple engines in the same process (unlikely in short term).
Description
Gluten implemented some logic to convert call expr as subfield filter. To avoid duplication, we would like to use the existing 'leafCallToSubfieldFilter' logic in Velox. One incompatibility we found is current implementation is matching against Presto function names, while Spark function names could be different. The other one is 'isnotnull' is frequently used in Spark and it is not currently supported. These are the drafted changes we would like to propose: https://github.com/rui-mo/velox/commit/73409033e5760a5c664c57adc282095910f257f6.