apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
63.17k stars 14k forks source link

[SIP-146] Save SQL lab query result and create sharable link #30256

Open xiaoshan1213 opened 2 months ago

xiaoshan1213 commented 2 months ago

Please make sure you are familiar with the SIP process documented here. The SIP will be numbered by a committer upon acceptance.

[SIP] Proposal for Save SQL lab query result and create sharable link

Motivation

Currently, Superset has the function to save the query and create a sharable link, but the query results is not associated with the sharable link, whoever open the sharable link need to rerun the query to get results. It is a waste of compute power and time for our customers. We are proposing to save query result along when user save the query, and associate the result along with the sharable link so whoever open the link can see the result.

Proposed Change

  1. Modify the /api/v1/saved_query/ POST function to include the results id within the request parameter.
  2. Modify the /api/v1/saved_query/{savedQueryId} GET function to return the results id within the response parameter.
  3. Modify the saved query metadata db to include the results id field.
  4. Modify the sql lab UI to fetch results if there is results ID returned from get saved query API.

New or Changed Public Interfaces

  1. Modify the /api/v1/saved_query/ POST function to include the results id within the request parameter.
  2. Modify the /api/v1/saved_query/{savedQueryId} GET function to return the results id within the response parameter.

New dependencies

No

Migration Plan and Compatibility

the metadata db will add a new field results_id, we need to migrate existing db records to include a empty string for this field.

Rejected Alternatives

Another approach not rejected but list here, we can create a new API to save query results and new API to create sharable link to share results so as to not modify the existing APIs.

dosubot[bot] commented 2 months ago

Hey @xiaoshan1213! 👋 I'm here to help you with bug fixes, questions, and becoming a contributor. Let's squash some bugs together!

To implement the proposed changes in the SIP, you can follow these steps:

1. Modify the /api/v1/saved_query/ POST Function

Update the request parameters to include the results_id field. Here is an example of how to modify the post_data dictionary:

post_data = {
    "schema": "schema1",
    "label": "label1",
    "description": "some description",
    "sql": "SELECT col1, col2 from table1",
    "db_id": example_db.id,
    "results_id": some_results_id,  # Add this line
}

2. Modify the /api/v1/saved_query/{savedQueryId} GET Function

Update the response parameters to include the results_id field. Here is an example of how to modify the expected result:

expected_result = {
    "id": saved_query.id,
    "catalog": None,
    "database": {"id": saved_query.database.id, "database_name": "examples"},
    "description": "cool description",
    "changed_by": None,
    "changed_on_delta_humanized": "now",
    "created_by": {
        "first_name": saved_query.created_by.first_name,
        "id": saved_query.created_by.id,
        "last_name": saved_query.created_by.last_name,
    },
    "sql": "SELECT col1, col2 from table1",
    "sql_tables": [{"catalog": None, "schema": None, "table": "table1"}],
    "schema": "schema1",
    "label": "label1",
    "template_parameters": None,
    "results_id": saved_query.results_id,  # Add this line
}
data = json.loads(rv.data.decode("utf-8"))
for key, value in data["result"].items():
    if key != "changed_on":
        assert value == expected_result[key]

3. Modify the Saved Query Metadata Database

Add a new results_id field to the saved_query table. Here is an Alembic migration script to achieve this:

from alembic import op
import sqlalchemy as sa

# Add a new column 'results_id' to the 'saved_query' table
def upgrade():
    op.add_column('saved_query', sa.Column('results_id', sa.Integer, nullable=True))

# Remove the 'results_id' column from the 'saved_query' table
def downgrade():
    op.drop_column('saved_query', 'results_id')

References

These changes will ensure that the query results are saved and associated with the sharable link, allowing users to view the results without rerunning the query.

To continue talking to Dosu, mention @dosu.

rusackas commented 2 months ago

@xiaoshan1213 would you want to move this forward through the SIP / ASF consensus process? Let us know on slack if you need help with that. Thanks!

rusackas commented 1 month ago

There are a few open questions here, such as: • Where are the results being stored/cached to avoid re-running the query • How do we deal with things like the data changing since the query was run (i.e. cache invalidation) • How do we address security implications, e.g. row level security, PII leakage, etc.

Please open a Discuss thread on the Dev list if you want to move forward with this proposal. Thanks!

xiaoshan1213 commented 1 month ago

Hi @rusackas thanks for the response, I just joined the slack, which channel should I use to open the discuss thread?

rusackas commented 4 weeks ago

@xiaoshan1213 Unfortunately, Slack doesn't count for official ASF consensus. There needs to be a DISCUSS thread on the Superset Dev Mailing List.

If you haven't already, you can subscribe to the Apache Superset Dev Mailing Listserv (dev@superset.apache.org, with public visibility here) by sending an email to dev-subscribe@superset.apache.org.

Here's an example of one such thread on the public archives: https://lists.apache.org/thread/ytv9vx7wrk07xxjg4m4kx5rx0zl5whx1

michael-s-molina commented 1 week ago

Thanks @xiaoshan1213 for the SIP. I share the same concerns as @rusackas related to stale data and security implications.

It is a waste of compute power and time for our customers.

I believe this should be resolved by the caching layer instead of a SQL Lab specific feature given that you have the same type of requirements when sharing links to charts, dashboards and SQL Lab queries.