cloudera-labs / envelope

Build configuration-driven ETL pipelines on Apache Spark
Apache License 2.0
158 stars 89 forks source link

[KuduOutput] Allow Missing Columns in Output Table #22

Closed rickysaltzer closed 6 years ago

rickysaltzer commented 6 years ago

This patch allows for your incoming dataset projection to contain columns that are not present in the destination Kudu table.

A new configuration option was added allow.missing.columns (boolean, default: false). If true, columns which are not present in the destination table will be ignored during the write process.

For example, if the incoming Dataset contains 3 columns [name, age, email] and the Kudu table only has columns for [name, email], then the [age] column will be omitted during insertion.