apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.36k stars 13.69k forks source link

Dynamic SQL generation at Schema Level and Database Level #17302

Closed bvy007 closed 2 years ago

bvy007 commented 2 years ago

I would like to know a way to work with superset in my scenario of creating a dashboard at work. I have different schemas in a database as shown in following directory structure:

Database ├── public
│ ├── schema_meta_Info
│ └── Statistics

├── DBClient1
│ ├── financial_data
│ ├── customer_data
│ └── Sales_data
│ ├── DBClient2
│ ├── financial_data
│ ├── customer_data
│ └── Sales_data
│ ├── DBClient3
│ ├── financial_data
│ ├── customer_data
│ └── Sales_data
│ ├── DBClient4
│ ├── financial_data
│ ├── customer_data
│ └── Sales_data

DBClient : Schemas _financial_data : financial information of customers in Client customer_data : Customers of Client Sales_data_ : Sales Information of Client's Customers

Question 1: And I am trying to create a dynamic SQL query and add it to the dashboard in superset. For this I am trying to have a chart with drop down(like filter in superset) option and wanted to select the DBClient (DBClient1 (or) DBClient2 (or) DBClient3 (or) DBClient4). Based on the my selection, I would like to show my plots with an updated query in the background.

For example plot1 in my dashboard wants to have a query like:

select * from {{SelectedDBClient}}.financial_data;

and if I had selected DBclient3 as the schema in the drop down plot of the dashboard, it should update the query of the plot1 as following:

select * from DBclient3 .financial_data;

Options that I tried:

How can I achieve this ??. In my scenario, I had 10000 schemas with same tables inside. Also assume that I am an admin (Assume no RBAC for the datasets are needed).

Question 2: Also I am curious to know how this can be done in multi database scenario (same conditions applied from Question 1 ) ?

rumbin commented 2 years ago

You can use Jinja, referring to the filter values. Just ensure you add a db_client column to the query which just contains the filter value and also refer to the filter value in the schema of the FROM clause.

The dashboard filter needs to be populated from another dataset that returns all possible db_clients then.

spdustin commented 2 years ago

I have a similar requirement, and I solved it by creating a "rollup view" for each table that looks something like this:

SELECT
    'Acme' as client_name,
    *
FROM
    schema_for_acme.my_table
UNION ALL
SELECT
    'Widgets, Inc.' as client_name,
    *
FROM
    schema_for_widgets_inc.my_table
UNION ALL
-- etc.

As a result, we can use a "master" dashboard with a filter control that staff can use, and use row-level security and RBAC to allow clients to see that same dashboard, but limit the data being returned. Client accounts are also prevented from accessing SQL Lab, etc.

The view is a temporary construct—we'll have a stage in our dbt pipeline that outputs fully materialized "master" tables.