apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 969 forks source link

[core] Introduce withReadType in ReadBuilder #4214

Closed Zouxxyy closed 2 months ago

Zouxxyy commented 2 months ago

Purpose

To #4209

public API

 /**
   * Push row type to the reader, support nested row pruning.
   *
   * @param readType read row type, can be a subset of {@link Table#rowType()}
   * @since 1.0.0
   */
  ReadBuilder withReadType(RowType readType);

mark ReadBuilder withProjection(int[][] projection) and ReadBuilder withProjection(int[] projection) as Deprecated

how to use

RowType writeType =
        DataTypes.ROW(
                DataTypes.FIELD(0, "pt", DataTypes.INT()),
                DataTypes.FIELD(1, "a", DataTypes.INT()),
                DataTypes.FIELD(2, "f0", DataTypes.INT()),
                DataTypes.FIELD(
                        3,
                        "f1",
                        DataTypes.ROW(
                                DataTypes.FIELD(4, "f0", DataTypes.INT()),
                                DataTypes.FIELD(5, "f1", DataTypes.INT()),
                                DataTypes.FIELD(6, "f2", DataTypes.INT()))));
// write
// GenericRow.of(0, 0, 0, GenericRow.of(10, 11, 12))

RowType readType =
        DataTypes.ROW(
                DataTypes.FIELD(
                        3,
                        "f1",
                        DataTypes.ROW(
                                DataTypes.FIELD(4, "f0", DataTypes.INT()),
                                DataTypes.FIELD(6, "f2", DataTypes.INT()))));

ReadBuilder readBuilder = table.newReadBuilder().withReadType(readType);

// read
// GenericRow.of(GenericRow.of(10, 12))
JingsongLi commented 2 months ago

We just need a pruneColumns(RowType requiredSchema).

Zouxxyy commented 2 months ago

RowType contains all the information (field name, field id, nested structure ... ), it can replace projection

The final API will be modified to like this

    @Deprecated
    default ReadBuilder withProjection(int[] projection) {
        // projection -> requiredSchema
        return pruneColumns(RowType requiredSchema);
    }

    ReadBuilder pruneColumns(RowType requiredSchema);
Zouxxyy commented 2 months ago

@JingsongLi Thanks for review, updated