apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
5.88k stars 2.07k forks source link

Is it possible to add a set of existing partitioned parquet files to the Iceberg table via the Java Standalone API #9763

Open calvinlfer opened 4 months ago

calvinlfer commented 4 months ago

Query engine

I am using Parquet4S + Hadoop-AWS Parquet4S is a wrapper on the Apache Parquet library and allows reading/writing of Parquet data to object storage infrastructure like AWS S3. The data that Parquet4S reads/writes is accessible to Spark/Flink/etc.

Question

I am using Parquet4S to write a directory of partitioned parquet files to S3. Is it possible to register these files with Apache Iceberg through the standalone Java API so that Iceberg can pick up the new parquet data? Are there any Iceberg Java APIs to help with this process?

manuzhang commented 4 months ago

You may want to check out AddFilesProcedure#importFileTable, which uses public Java API to import parquets to Iceberg table.