astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
330 stars 40 forks source link

Support loading metadata columns from stage into table for Snowflake #2023

Closed pankajkoti closed 10 months ago

pankajkoti commented 10 months ago

Adds support to load metadata columns like METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, etc. from stage into target table while natively loading files into Snowflake tables. Read more at: https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

Note that, you cannot specify both validation_mode and metadata_columns together in the Snowflake load options because when we need to load metadata_columns, we need to explicitly name the metadata columns in the COPY INTO sql statement and such a transformed SQL statement does not allow specifying VALIDATION_MODE with it. It's a limitation for Snowflake queries. The transformed SQL appears like in the snippet in the following link: https://docs.snowflake.com/en/user-guide/querying-metadata#example-3-loading-metadata-columns-into-a-table

closes: #1982

pankajkoti commented 10 months ago

The transformed COPY INTO command executed is like the below when metadata_columns are specified:

'COPY INTO <TABLE_NAME> FROM (SELECT $1,$2,METADATA$FILENAME,METADATA$FILE_ROW_NUMBER,METADATA$FILE_CONTENT_KEY,METADATA$FILE_LAST_MODIFIED,METADATA$START_SCAN_TIME FROM @SANDBOX.<schema_name>.<stage_name>/sample.csv) '
pankajkoti commented 10 months ago

Sample load output with metadata columns

Screenshot 2023-08-28 at 1 09 30 PM
codecov[bot] commented 10 months ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.73% :tada:

Comparison is base (4176abf) 89.54% compared to head (09d2b10) 90.28%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2023 +/- ## ========================================== + Coverage 89.54% 90.28% +0.73% ========================================== Files 75 75 Lines 4296 4324 +28 Branches 531 537 +6 ========================================== + Hits 3847 3904 +57 + Misses 354 332 -22 + Partials 95 88 -7 ``` | [Flag](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer) | Coverage Δ | | |---|---|---| | [PythonSDK](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer) | `90.28% <100.00%> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files Changed](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer) | Coverage Δ | | |---|---|---| | [python-sdk/src/astro/databases/snowflake.py](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer#diff-cHl0aG9uLXNkay9zcmMvYXN0cm8vZGF0YWJhc2VzL3Nub3dmbGFrZS5weQ==) | `85.33% <100.00%> (+1.13%)` | :arrow_up: | | [python-sdk/src/astro/options.py](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer#diff-cHl0aG9uLXNkay9zcmMvYXN0cm8vb3B0aW9ucy5weQ==) | `97.61% <100.00%> (+0.05%)` | :arrow_up: | ... and [6 files with indirect coverage changes](https://app.codecov.io/gh/astronomer/astro-sdk/pull/2023/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Andrew-Wichmann commented 10 months ago

Nice! Thank you!