Closed juripetersen closed 7 months ago
@juripetersen Thanks for the contributions. Could you please check couple of comments I left?
@juripetersen Thanks for the contributions. Could you please check couple of comments I left?
@kbeedkar Sure, however I can't seem to find your comments yet.
I can see the problem of having a cyclic reference and will remove that.
Discussing offline with @juripetersen, we saw that there may be the possibility to use a JoinDescriptor similar to this one that is for the projections: https://github.com/apache/incubator-wayang/blob/main/wayang-commons/wayang-basic/src/main/java/org/apache/wayang/basic/function/ProjectionDescriptor.java
@kbeedkar what do you think?
That makes sense. However, I suggest checking if we can follow the same approach as for filters. https://github.com/apache/incubator-wayang/blob/5e6a07edaae3e7ebac9ac73814f95bafe741845d/wayang-platforms/wayang-postgres/src/main/java/org/apache/wayang/postgres/operators/PostgresFilterOperator.java#L30
i.e., the postgresJoin's constructor takes in two arguments of type TransformationDescriptor<Record, Key>
@kbeedkar the PostgresFilterOperator
uses the sqlImplementation
and the executor only compiles this "hardcoded" information into the desired SQL query string. This can be seen in the JDBC FunctionCompiler
:
public class FunctionCompiler {
/**
* Compile a predicate to a SQL {@code WHERE} clause.
*
* @param descriptor describes the predicate
* @return a compiled SQL {@code WHERE} clause
*/
public String compile(PredicateDescriptor descriptor) {
final String sqlImplementation = descriptor.getSqlImplementation();
assert sqlImplementation != null;
return sqlImplementation;
}
}
This is also where the currently proposed solution of this PR was inspired from. @zkaoudi and I were discussing an approach to avoid this and actually use a generic interface in Joins. However, this seems to be non-trivial and is likely to break backwards compatibility.
I can imagine using something like this when defining the Join:
final ExecutionOperator joinOperator = new HsqldbJoinOperator<Integer>(
new JoinDescriptor<Record, Record, Integer>(
"tableName1",
"tableName2",
"keyName1",
"keyName2",
Record.class,
Record.class,
Integer.class
)
);
Where JoinDescriptor
extends TransformationDescriptor
and JdbcJoinOperator
can only be derived from JoinOperator
s that are defined with JoinDescriptor
.
But this would obviously only work for Jdbc based Joins and not for Java or spark joins since the FunctionDescriptor is omitted. So the solution proposed right now with the extension of TransformationDescriptor seems to be better than the JoinDescriptor.
Yeah, we wouldn't want to break things for Spark and Java. So maybe we merge your PR with the sqlImplementation and create an issue about this for all operators, as it requires a more general approach.
@juripetersen can you please check? There are some compilation errors that make the build fail.
Yes, extending the TransformationDescriptor sounds good.
@zkaoudi will sort the compilation out now and push afterwards.
@zkaoudi @kbeedkar compilation works fine locally now.
References #384
The proposed solution utilises a somewhat "hacky" solution to the problem of providing a generic interface for JoinOperators. This is taken from previous examples of
ProjectionDescriptor
that allow to give a "hardcoded"sqlImplementation
that specifies how the Keys are to be retrieved.As you can see in the provided test, this results in JoinOperators taking two string identifiers that specify the tables name and key to join on:
Thus, the
JoinMapping
specifies the need of presence for these values in the mapping itself: