apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.47k stars 969 forks source link

[flink] Support count star push down to source for append table #4236

Closed JingsongLi closed 2 months ago

JingsongLi commented 2 months ago

Purpose

Implements Flink SupportsAggregatePushDown for improving performance.

Only works on Append table without deletion vectors.

Valid cases are:

@Test
public void testCountStarAppend() {
    sql("CREATE TABLE count_append (f0 INT, f1 STRING)");
    sql("INSERT INTO count_append VALUES (1, 'a'), (2, 'b')");

    String sql = "SELECT COUNT(*) FROM count_append";
    assertThat(sql(sql)).containsOnly(Row.of(2L));
    validateCount1PushDown(sql);
}

@Test
public void testCountStarPartAppend() {
    sql("CREATE TABLE count_part_append (f0 INT, f1 STRING, dt STRING) PARTITIONED BY (dt)");
    sql("INSERT INTO count_part_append VALUES (1, 'a', '1'), (1, 'a', '1'), (2, 'b', '2')");
    String sql = "SELECT COUNT(*) FROM count_part_append WHERE dt = '1'";

    assertThat(sql(sql)).containsOnly(Row.of(2L));
    validateCount1PushDown(sql);
}

Tests

API and Format

Documentation