Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
183 stars 39 forks source link

[FEATURE] Tableau refresh operators #90

Closed maxim-mityutko closed 6 months ago

maxim-mityutko commented 7 months ago

Is your feature request related to a problem? Please describe. Tableau is widely used to expose insights to the users via dashboards. However there are limited capabilities how Tableau can handle data refresh in the data sources or workbooks: on-demand, schedule or live connection. However it is much more convenient to trigger the refresh of Tableau assets right after the data is ready, right from the pipeline. This approach makes sure that end users always get the latest data, even if pipeline delayed or had to be rescheduled.

Cloud Information

Describe the solution you'd like The solution involves using the TableauServerClient (TSC) for Python to interact with the Tableau server, identify the assets that require refresh based on the set of parameters like name, project, parent project, site and asynchronously triggering one or multiple jobs on the Tableau server. To provide a feedback loop back into the pipeline the process should poll the server for the job(s) outcome, analyze it when all jobs are finished and succeed or fail the task accordingly.

Describe alternatives you've considered Live connection - not always feasible if the batches are running once / twice per day, due to amount of data, user count, cost On-demand - requires manual user intervention to trigger the refresh on the server Schedule - won't produce consistent refresh results if the upstream is delayed or had to be rerun

Additional context Add any other context or screenshots about the feature request here.