etsy / boundary-layer

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
Apache License 2.0
262 stars 58 forks source link

Use enum to replace plain string in the export_format of bigquery_to_gcs #96

Closed rfan-debug closed 3 years ago

rfan-debug commented 3 years ago

People would easily make mistakes if they replace CSV with JSON in the export_format because JSON is not in the allowed format list.

https://github.com/apache/airflow/blob/1.10.3/airflow/contrib/hooks/bigquery_hook.py#L1096-L1098

The correct argument for exporting JSON should be NEWLINE_DELIMITED_JSON, which is implicit in the current yaml template. If you actually pass JSON to the export_format field, Airflow will automatically correct it into the default value CSV and you will get unexpected result even when you think you set the "correct" argument here.

Hereby we use the enum to replace the original plain string to enforce the argument check at the first place, so that you won't be confused by the automatic fallback to CSV.