Open akroviakov opened 2 years ago
@pdamme When performing branching in AggRow.h
, the warning about DCTX(ctx)
argument is being displayed for the instantiation of AggRow<DenseMatrix<int>, DenseMatrix<const char*>>
(i.e., IDXMAX/IDXMIN
ops that are implemented in-place and the code for should not compile past these ops), is it allowed to add [[maybe_unused]]
to that argument in apply()
?
The reason the code should not compile past these ops is that when the compiler tries to compile the rest, the line with *valuesRes = agg;
will fail, as it tries to perform the conversion from const char*
to int
, so we have to cut this branch and with that, the usage of ctx
.
I assume you mean you get a warning saying that ctx
is unused in AggRow<...>::apply()
? That surprises me a little, because we have many kernels that do not use ctx
and don't cause such a warning. If it helps, feel free to add [[maybe_unused]]
.
Yes, I see that the code for some aggregation functions cannot be compiled for const char *
as the input value type. I would be curious to know how you "cut" the branch.
For ewLike
it makes sense to use std::regex
. However then, in EwBinarySca.h
, when using MAKE_EW_BINARY_SCA
or a separate template specialization, where lhs
is a string and rhs
is a pattern, we have to face a choice:
std::regex
for template specialization as an argument and construct it elsewhere before the call.The first case means a serious performance hit.
The second one is more about the construction place of that regex and the dependency on std::regex
for users of EwBinarySca.h
. Moreover, since the main user of EwBinarySca
is EwBinaryMat
and the general Like
operation expects inputs <DenseMatrix, columnIdx, pattern>
, we'd need a new specialization of EwBinaryMat
just for Like
operation.
I already implemented a primitive form of Like
as a simple kernel, should I pursue EwBinaryMat
specialization or could a simple kernel be used for now?
Very good observation regarding Like
and regex. First of all, as commented on PR #418, Like
should really be like any other elementwise binary comparison (no special kernel with a custom interface). In particular, we need at least:
EwBinarySca
), such that we can say b = "hello" like "%lo";
on scalars, just like we could say b = "hello" == "lo";
. EwBinarySca
should get the pattern as a string (const char *
for now) and create the std::regex
internally. That's simple and doesn't cause any overhead for the one-time use on scalars.EwBinaryObjSca
), such that we can say bv = myFrame[, "name"] like "foo%";
. The implementation could be based on EwUnarySca
, for simplicity (your point 1). You are right, the repeated creation of the regex
would be quite inefficient, but as a first step, the feature is more important that its performance. As a second step, we could implement a special case (code branch) in EwBinaryObjSca
for the LIKE
op-code, which creates the regex
once and reuses it for all comparisons, by not invoking EwBinarySca
.EwBinaryMat
), where each element in the left-hand input is matched against the corresponding pattern in the right-hand input. Thus, also one-time use (unless there are duplicate patterns).All three variants could be used from SQL in DaphneDSL as well.
@pdamme I'm trying to implement like
in DSL just like other ewOps and stumbled upon the following:
I can use like
in DSL for
x = "eHelloe" like "e%e";
print(x ? true : false);
but not for
x = "eHelloe" like "e%e";
print(x);
the error is
/daphne/thirdparty/llvm-project/mlir/lib/IR/BuiltinAttributes.cpp:248: int64_t mlir::IntegerAttr::getInt() const: Assertion `(getType().isIndex() || getType().isSignlessInteger()) && "must be signless integer"' failed. Aborted (core dumped)
This error can be fixed by adding utils.castBoolIf()
in the visitor, though I'm not sure why is it needed, if folding returns bool anyways:
// DaphneDialect.cpp
mlir::OpFoldResult mlir::daphne::EwLikeOp::fold(ArrayRef<Attribute> operands) {
...
if(operands[0].isa<StringAttr>() && operands[1].isa<StringAttr>()) {
...
return mlir::BoolAttr::get(lhs.getContext(), std::regex_match(lhs.getValue().str(), toMatch));
}
Then I tried to do x = [eHelloe] like "e%e";
, but got:
JIT session error: Symbols not found: [ _cast__bool__DenseMatrix_char, _ewLike__DenseMatrix_char__DenseMatrix_char__char ]
Regardless of the cast, ewLike tries to return DenseMatrix of strings, and it is obvious, since Daphne_EwBinaryOp
in DaphneOps.td
does not differentiate for in/out types. So I tried to change it:
class Daphne_EwBinaryOp<string name, Type scalarTypeIN, Type scalarTypeOUT, list<OpTrait> traits = []>
...
let arguments = (ins AnyTypeOf<[MatrixOf<[scalarTypeIN]>, scalarTypeIN]>:$lhs, AnyTypeOf<[MatrixOf<[scalarTypeIN]>, scalarTypeIN]>:$rhs);
let results = (outs AnyTypeOf<[MatrixOf<[scalarTypeOUT]>, scalarTypeOUT]>:$res);
let builders = [
OpBuilder<(ins "Value":$lhs, "Value":$rhs), [{
// TODO This is wrong if lhs is a scalar and rhs is a matrix.
$_state.addTypes(lhs.getType());
$_state.addOperands({lhs, rhs});
}]>
];
and define ewLike
as
def Daphne_EwLikeOp : Daphne_EwBinaryOp<"ewLike", StrScalar, IntScalar>;
Now I get
error: 'daphne.ewLike' op result #0 must be matrix of integer or placeholder for an unknown type values or integer, but got '!daphne.Matrix<?x?x!daphne.String>'
module {
func @main() {
%0 = "daphne.constant"() {value = 94092562866896 : ui64} : () -> ui64
%1 = "daphne.constant"() {value = "e%e"} : () -> !daphne.String
%2 = "daphne.constant"() {value = true} : () -> i1
%3 = "daphne.constant"() {value = false} : () -> i1
%4 = "daphne.matrixConstant"(%0) : (ui64) -> !daphne.Matrix<?x?x!daphne.String>
%5 = "daphne.ewLike"(%4, %1) : (!daphne.Matrix<?x?x!daphne.String>, !daphne.String) -> !daphne.Matrix<?x?x!daphne.String>
%6 = "daphne.cast"(%5) : (!daphne.Matrix<?x?x!daphne.String>) -> i1
"daphne.print"(%6, %2, %3) : (i1, i1, i1) -> ()
"daphne.return"() : () -> ()
}
}
So ewLike
still tries to return DenseMatrix of strings.
We can notice the line $_state.addTypes(lhs.getType());
in Daphne_EwBinaryOp
, where we infer the result type, and that is why we see but got '!daphne.Matrix<?x?x!daphne.String>'
.
One could leave the original Daphne_EwBinaryOp
untouched (i.e., no explicit return type), copy its body to Daphne_EwLikeOp
and change it, so that it operates on fixed types. But it seems that I cannot easily change the argument of $_state.addTypes(lhs.getType());
to anything, other than what I have (lhs/rhs).
What would be your advise on how to explicitly specify the return type in this situation?
Alright, the first issue above was due to the EwLikeOp::fold()
returning BoolAttr
, instead of IntegerAttr
.
I couldn't find a way to solve the second issue in a general Daphne_EwBinaryOp
manner, so for now it is similar to Daphne_EwBinaryOp
, but without type inference (there's nothing to infer). I shifted the res
type specification to DaphneDSLVisitor.cpp
, so now the call looks like
if(op == "like"){
if(lhs.getType().isa<mlir::daphne::MatrixType>())
return static_cast<mlir::Value>(builder.create<mlir::daphne::EwLikeOp>(loc, utils.matrixOf(resType), lhs, rhs));
else
return static_cast<mlir::Value>(builder.create<mlir::daphne::EwLikeOp>(loc, resType, lhs, rhs));
}
Would be interesting to know if I've missed some way to implement it in a (more) general manner.
As you have already found out, the problem is connected to type inferencem, but also to type constraints. It should be fixed in a pragmatic way for this PR, since I'm refactoring type inference in another branch anyway. For some background: the definition of a op in DaphneOps.td
, e.g., def Daphne_EwLikeOp : Daphne_EwBinaryOp<"ewLike", StrScalar, IntScalar>;
merely defines constraints on the input/output types (besides other things), but doesn't do the inference. The constraints are automatically checked at various points in time, and any type inference must respect these constraints.
Currently, EwBinaryOp
expects the same value type scalarType
for its inputs and output (DaphneOps.td
line 214f). That needs to be generalized to support different value types. Once done, it should be possible to define EwLikeOp
as a EwBinaryOp
(or even better EwCmpOp
), and we could save some of the extra interface implementations for EwLikeOp
.
Furhermore, I recommend adapting the type inference in the custom builder in DaphneOps.td
line 217ff (on your branch)
let builders = [
OpBuilder<(ins "Value":$lhs, "Value":$rhs), [{
// TODO This is wrong if lhs is a scalar and rhs is a matrix.
$_state.addTypes(lhs.getType());
$_state.addOperands({lhs, rhs});
}]>
];
Inside the code, you can access an mlir::Builder
by $_builder
. For instance, you could say $_state.addTypes($_builder.getIntegerType(64, true));
, if the inputs are (matrices of) strings (see https://mlir.llvm.org/docs/OpDefinitions/#custom-builder-methods, https://mlir.llvm.org/doxygen/classmlir_1_1Builder.html).
Hope that helps.
Side note regarding constant folding: Constant folding happens at certain stages in the compilation chain, but we already have IR before that, and it must not violate constraints, e.g., on the input/output types. Generally, it should also work without folding, because there can be cases where the inputs are not compile-time constants.
With the introduction of
DenseMatrix
for strings, we need to support kernels for strings, as well as add new string-specific kernels. Specializing templates for strings may result in possibly severe code duplication. To address this issue, a conditional compilation (of a string-specific branch) within templates is preferred.One of the main drawbacks in this regard of the current "naive" implementation is the usage of
const char*
to store strings, which requires heap allocation (and subsequent deletion) of temporaries before they can beset()
in the resultDenseMatrix
.Objectives:
Introduce elementwise string-specific kernels: 1.1. Unary:
ewLower
,ewUpper
1.2. Binary:ewConcat
,ewLike
Extend existing kernels (not all of them, just try to find a good way to handle different classes of kernels) to support strings: 2.1. Output of new strings:
fill
,matrixConstant
2.2. Only output of existing strings:transpose
2.3. No output of strings: elementwise binary comparisons (lexicographical order) 2.4. Output of single string: aggregations