gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

HDFS/Downloads Table Incremental Build #1078

Open fmendezh opened 1 week ago

fmendezh commented 1 week ago

The TableBackfill supports the creating of partitioned tables (by datasetKey), Apache Iceberg partitions supports the atomic update/replace of table partitions INSERT OVERWRITE TABLE occurrence PARTITION (dataset_key). This feature can be used to keep the download table up-to-date and use the current table build process for disaster recovery and data synchronisation.