Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
183 stars 36 forks source link

Update SnowflakeOperator and UcToSnowflakeOperator to accept sql_file and custom dbx sql #116

Closed bsangars closed 3 months ago

bsangars commented 3 months ago

Update Snowflake and UctoSnowflakeOperator Operators to accept sql file and custom db_sql queries as inputs

Description

It is an enhancement to pass sql_file to Snowflake Operator and custom sql to extract data from Unity catalog.

Below are the changes implemented as part of this PR

  1. sql_file parameter is added to SnowflakeOperator
  2. Raise an exception if both query_string and sql_file is passed
  3. Add an additional parameter dbx_sql to UcToSnowflakeOperator to use custom sql when extracting data from Unity Catalog
  4. Update Mandatory keys to expect one Of dbx_sql or (dbx_catalog, dbx_database, dbx_table)
  5. Add an additional (optional) parameter write_mode to specify while writing to Snowflake from Unity Catalog

Related Issue

https://github.com/Nike-Inc/brickflow/issues/115

Motivation and Context

This is an enhancement to account for below use cases

  1. passing sql file instead of query string ( which would be an additional hop everytime we call operator)
  2. passing in custom sql to extract data from Unity catalog. Current code only accepts guard rail sql filters against select statements. using this enhancement user has better control on the source

    How Has This Been Tested?

    Tested the updated operators for multiple scenarios

  3. using the sql file
  4. using query string
  5. using both (negative testing) Tested the UcToSnowflakeOperator for below scenarios
  6. Incremental
  7. Full load with dbx_sql and tables
  8. write mode

Screenshots (if appropriate):

Make check result

Screenshot 2024-05-15 at 11 51 33 AM

Updated brickflow operators workflow image image image image

Types of changes

Checklist: