dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
1.96k stars 114 forks source link

Streamlit front-end for creating dlt pipelines #1415

Open andrejakobsen opened 1 month ago

andrejakobsen commented 1 month ago

Feature description

Extending the current Streamlit app to also allow users to create and configure dlt pipelines, could significantly help increase the adoption of dlt in organizations with decentralized data ownership. One of the appealing features of a UI-based tool such as Airbyte is that it enables non-technical members of an organization to ingest data into a data platform by configuring sources and destinations without having to learn Python or a CLI.

While I really appreciate that dlt is "just" a Python library, I see this as potentially helping increase data democracy in larger organizations by making teams self-sufficient up until the pipeline needs to be productionized, as well as relieving the load on more technical data teams.

Are you a dlt user?

Yes, I run dlt in production.

Use case

I can see two features that would make the current Streamlit app more useful for non-technical users:

  1. A front-end for creating a dlt pipeline by choosing and configuring sources and destinations. This would be similar to the dlt init commands, but could get rid of even more manual boilerplate code like changing the names of sources and resources.
  2. A dlt version of the Airbyte Connector Builder.

Here is a specific example of how the app could be used in an organization with a centralized data platform team and many independent data teams that rely on the centralized team for getting their data into the platform:

A non-technical person plays around in the app and configures the sources and destinations that they need to ingest into the platform. After creating a pipeline in the UI, the user tests that all the connections work as expected and perform a test run of the pipeline that displays a sample of the data (which the current Streamlit app already supports). Only after they are happy with the results do they reach out to the data platform team to put the pipeline into production. This could be done by either creating a PR or simply sending them the generated dlt code.

Proposed solution

In essence, such a Streamlit app would simply generate the dlt configuration and Python code based on a user's selections in Streamlit.

A dlt version of the Airbyte Connector Builder could be quite easy to implement with the new REST API generic source, considering that Airbyte also uses declarative configuration under the hood:

The connector builder UI provides an intuitive UI on top of the low-code YAML format and uses built connectors for syncs within the same workspace directly from within the UI. We recommend using it to iterate on your low-code connectors.)

Related issues

https://github.com/dlt-hub/dlt/issues/459

andrejakobsen commented 3 weeks ago

I have started building a very simple Streamlit app that will eventually do some of the things above. Let me know if there is any interest in moving the development to the existing dlt Streamlit app.