CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Auto-detect schema change a recreate tables for HadoopConnectors #63

Open alexjbush opened 5 years ago

alexjbush commented 5 years ago

Expected Behavior

Auto-detect if a schema has changed for a table and recreate table accordingly, rather than requiring forceRecreateTables flag.

Actual Behavior

The forceRecreateTables must be set to true to recreate the tables, which sometime it is not and causes schema validation issues.

alexjbush commented 5 years ago

Some initial work here: https://github.com/CoxAutomotiveDataSolutions/waimak/tree/feature/auto-detect-schema-changes

I think the approach is wrong, maybe we should store schema information in a metadata directory in the output path. Something like $outputDir/.table_schemas/$tableName. That way we won't have to infer schema information.