daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

901 inner join and semi join with result cardinality hint #918

Open saminbassiri opened 3 days ago

saminbassiri commented 3 days ago

Enhance Join Operations with Customizable Result Size Allocation


PR Description

Enhancements to Join Operations

This update introduces a new optional parameter, numRowRes, to the InnerJoin and SemiJoin DaphneDSL Operations, enabling precise control over result size allocation. Addresses issue #901.

Key Changes:

  1. Kernel changes:

    • innerJoin:
      • If numRowRes = -1, the result size defaults to numRowRhs * numRowLhs (cartesian product).
      • Otherwise, the result size is defined by numRowRes.
    • semiJoin:
      • If numRowRes = -1, the result size defaults to numRowLhs.
      • Otherwise, the result size is defined by numRowRes.
  2. DaphneDSL Updates:

    • numRowRes is now an optional argument for innerJoin and semiJoin.
    • Defaults to -1 if not provided.
  3. DaphneIR Adjustments:

    • numRowRes is now a mandatory argument for InnerJoinOp and SemiJoinOp.
  4. Implementation Updates:

    • Modified DaphneDSLBuiltins.cpp to set default values for numRowRes.
    • Updated SQLVisitor.cpp to ensure compatibility by passing -1 as numRowRes.
    • Adjusted kernels.json to reflect the new parameter for relevant operations.
  5. Testing:

    • Added script-level test cases to validate correct behavior across various scenarios.

Bug Fixes