Open shashank-google opened 2 years ago
Tested the following templates for Option 1 and the results are as follows :
Testing in progress for HIVEtoBQ template. For part 2 of the description, we can utilise partition and clustering attributes from spark-bigquery connector. Additional checks can be made for partitioning field, only when it is date/datetime/similar field we can continue partitioning given bigquery constraints.
Dependent on child issues #631 #632
Option 1 - Test and verify User would manually create empty table in BQ with partitioning and clustering. The xyz to BQ template will then move data into it. What if sequence of columns in source avro / jdbc / hive etc do not match with existing table in BigQuery.
Option 2 - Explore If BQ table does not exists (or overwrite flag is supplied), then how can template automatically determine clustering and partitioning. Look at corresponding Dataflow templates for ideas.