flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.79k stars 660 forks source link

[Feature] [Plugins] Data Catalog #706

Closed EngHabu closed 11 months ago

EngHabu commented 3 years ago

Motivation: Why do you think this is important? DataCatalog is Flyte's service to provide data lineage, tracking and memoization for workflow executions.

Goal: What should the final outcome look like, ideally? A flytekit and a backend webAPI plugins to retrieve and push artifacts to DataCatalog.

Not an exact example:

@workflow
def my_wf():
    # pull data from datacatalog
    a = datacatalog.query(tag="abc")
    x = my_task(input=a)

    # push data to datacatalog
    datacatalog.publish(dataset=x, tag="xyz")

FlytePlugins WebAPI AWS Athena Plugin

Flyte component

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

github-actions[bot] commented 1 year ago

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏