Kedro Snowflake Pipelines plugin
We help companies turn their data into assets
About
This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
- Kedro starter, to get you up to speed fast
- automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
- translating Kedro pipeline into Snowflake tasks graph
- running Kedro pipeline fully within Snowflake, without external system
- using Kedro's official
SnowparkTableDataSet
- automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
- (New!) MLflow integration with Snowflake with examples in Snowflights Kedro starter
Documentation
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/
Usage
With starter
-
Install the plugin
pip install "kedro-snowflake>=0.1.0"
-
Create new project with our Kedro starter ❄️ Snowflights 🚀:
kedro new --starter=snowflights --checkout=master
And answer the interactive prompts ⬇️ (click to expand)
```
Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed.
[Snowflights]:
Snowflake Account
=================
Please enter the name of your Snowflake account.
This is the part of the URL before .snowflakecomputing.com
[]: abc-123
Snowflake User
==============
Please enter the name of your Snowflake user.
[]: user2137
Snowflake Warehouse
===================
Please enter the name of your Snowflake warehouse.
[]: compute-wh
Snowflake Database
==================
Please enter the name of your Snowflake database.
[DEMO]:
Snowflake Schema
================
Please enter the name of your Snowflake schema.
[DEMO]:
Snowflake Password Environment Variable
=======================================
Please enter the name of the environment variable that contains your Snowflake password.
Alternatively, you can re-configure the plugin later to use Kedros credentials.yml
[SNOWFLAKE_PASSWORD]:
Pipeline Name Used As A Snowflake Task Prefix
=============================================
[default]:
Enable Mlflow Integration (See Documentation For The Configuration Instructions)
================================================================================
[False]:
The project name 'Snowflights' has been applied to:
- The project title in /tmp/snowflights/README.md
- The folder created for your project in /tmp/snowflights
- The project's python package in /tmp/snowflights/src/snowflights
```
-
Run the project
cd snowflights
kedro snowflake run --wait-for-completion
In existing Kedro project
- Install the plugin
pip install "kedro-snowflake>=0.1.0"
- Initialize the plugin
kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
- Run the project
kedro snowflake run --wait-for-completion
Kedro pipeline in Snowflake Tasks
Execution: