Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
183 stars 36 forks source link

Add sql file path to Snowflake Operator and add custom sql for UctoSnowflakeOperator #121

Closed bsangars closed 3 months ago

bsangars commented 3 months ago

Update Snowflake and UctoSnowflakeOperator Operators to accept sql file and custom db_sql queries as inputs

Description

It is an enhancement to pass sql_file to Snowflake Operator and custom sql to extract data from Unity catalog.

Below are the changes implemented as part of this PR

  1. sql_file parameter is added to SnowflakeOperator ( File path is relative to brickflow_root)
  2. Raise an exception if both query_string and sql_file is passed
  3. Add an additional parameter dbx_sql to UcToSnowflakeOperator to use custom sql when extracting data from Unity Catalog
  4. Update Mandatory keys to expect one Of dbx_sql or (dbx_catalog, dbx_database, dbx_table)
  5. Add an additional (optional) parameter write_mode to specify while writing to Snowflake from Unity Catalog
  6. Add additional function to get brickflow project root and write unit test for the function

Related Issue

https://github.com/Nike-Inc/brickflow/issues/115

Motivation and Context

This is an enhancement to account for below use cases

  1. passing sql file instead of query string ( which would be an additional hop everytime we call operator)
  2. passing in custom sql to extract data from Unity catalog. Current code only accepts guard rail sql filters against select statements. using this enhancement user has better control on the source

How Has This Been Tested?

Tested the updated operators for multiple scenarios in local and dev environments

  1. using the sql file
  2. using query string
  3. using both (negative testing)
  4. Tested the UcToSnowflakeOperator for below scenarios
  5. Incremental
  6. Full load with dbx_sql and tables
  7. write mode
  8. Unit test for the new get_bf_project_root function

Types of changes

Screenshots (if appropriate):

Screenshot 2024-05-23 at 2 09 01 PM Screenshot 2024-05-23 at 2 17 06 PM Screenshot 2024-05-23 at 2 16 46 PM

Local testing -

Screenshot 2024-05-23 at 2 30 28 PM

Dev Testing ( from gitHub) -

Screenshot 2024-05-23 at 2 31 57 PM Screenshot 2024-05-23 at 2 35 36 PM Screenshot 2024-05-23 at 2 35 12 PM

Checklist: