flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.79k stars 660 forks source link

[Plugin] Helper functions for StructuredDatasets and FlyteFiles in Papermill Plugin #3617

Closed peridotml closed 1 year ago

peridotml commented 1 year ago

Use Case

A common use-case for the papermill plugin could be automatically generating notebook reports at the end of modeling / etl pipelines. Having a notebook is useful because a Data Scientist can download it and inspect results further. The alternative is much harder.

Problem

Unfortunately, Papermill's inputs are limited, which makes it difficult to get data and files from Flyte into the notebooks. It requires a few extra tasks.

Idea

Providing helper functions that work with common data types like StructuredDataset, FlyteDirectory, and FlyteFiles.

It could be an inputs version of record_outputs, although serializing into json might difficult. See https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py#L294.

welcome[bot] commented 1 year ago

Thank you for opening your first issue here! 🛠

peridotml commented 1 year ago

Here is an idea: https://gist.github.com/peridotml/8e3f4d303d117280cd0c9be8fc84f7c3.