Feature Proposal: Tableau dashboard extractors for databuilder

ccarterlandis commented 4 years ago

Expected Behavior or Use Case

With the recent addition of dashboard support for Amundsen, it's now possible to build extractors for Tableau dashboards and visualizations. Since Tableau is widely used as a data visualization and analysis tool, having the ability to index these Tableau dashboards inside Amundsen gives better context for how the data is actually being used and enables users to discover and share dashboards and visualizations that have already been built.

Service or Ingestion ETL

These extractors would be implemented in the amundsendatabuilder module. Currently, the extractors would not require changes to any other Amundsen module.

Possible Implementation

This proposal is currently a work in progress. You can track the progress here: https://github.com/lyft/amundsendatabuilder/pull/303

Overview

The extractors are built around Tableau workbooks being the Amundsen equivalent of a dashboard. The extractors utilize Tableau's Metadata API to query information about workbooks and their associated entities like projects (dashboard_groups), custom SQL queries (dashboard_query), and sheets/dashboards within the workbooks (dashboard_chart).

Relations between the Amundsen dashboard model and Tableau

Luckily, the Tableau Metadata API uses a GraphQL schema for querying, so retrieving the data and loading it into Neo4j's GraphQL schema is relatively straightforward. However, there are a few notable differences in the conceptual models that need to be addressed:

In Amundsen, dashboard charts belong to dashboard queries; that is, each chart is built on a query. However, in Tableau, the closest thing to a chart is a "sheet", which is not necessarily built on a custom SQL query. There are a few options for resolving this, which could include categorizing the same empty query object for Tableau sheets not built on a query, or updating the model to restructure the hierarchy between dashboard charts and dashboard queries.
While Tableau does support multi-level projects, for simplicity's sake the extractor currently only uses the top level projects to create the dashboard groups. Is this desirable behavior?
Usage statistics are available only at the sheet level, and are only available through Tableau's REST API](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api.htm). Should these be aggregated by workbook and count towards workbook views/frequent users, or ignored?

Technical notes

The Tableau dashboard table extractor will likely be specific to each implementation of Amundsen. Currently, there are no solid plans to build an open source version of this extractor, but I would be interested in discussing what a generic version might look like if that would prove useful.
Most of the data needed is available through the Tableau Metadata API, but some, like project descriptions, dashboard previews, and usage statistics are only available through the Tableau REST API. The two APIs share authorization tokens, so there is some element of re-usability that can be abstracted out. While there is currently no solid plan for how these calls to the Tableau REST API will hook into the rest of the extractors, we would like to include this data in the final integration, so any ideas are welcome.

Context

@alevene and I are building this integration on behalf of Gusto. For Gusto's use case, we are interested in exposing Tableau resources in Amundsen to better facilitate the discovery of existing dashboard resources, so we can avoid duplicate dashboard development and to provide background on the provenance/lineage of the dashboards.

dorianj commented 3 years ago

This is implemented, right? Please re-open if I missed something major; if we want to make enhancements, new smaller tickets would be good.

ccarterlandis commented 3 years ago

Yep, this is implemented - my internship at Gusto ended before before the PR got merged, so I totally forgot about this issue. Sorry about that! I think you are right to close it 🚀

amundsen-io / amundsen