Open betodealmeida opened 3 weeks ago
Thanks! Please launch a DISCUSS thread on the Dev mailing list if you want to carry this forward ;)
I think the simplification makes sense. However, the issue I see with this approach is the fact that the request won't be rendered by the chart plugin, which was the intention of SIP-6. While we do currently store the query context in the chart table as part of the chart metadata, this is in a sense a cached version of what the plugin rendered when the chart was saved. Relying on this cached value makes it impossible to roll out changes to a viz plugin's buildQuery
that may have been introduced during a version upgrade, and could lead to dashboards breaking post upgrade. In fact, it can be argued that the introduction of the query_context
field to the chart metadata was in fact technical debt that contradicts SIP-6. So as part of this SIP, I feel we should revisit that SIP first, and think carefully of how we can facilitate an architecture, where
viz.py
, and I'd hate to see us go back there)CC @michael-s-molina @rusackas as this touches on the plugin architecture discussion we've been having.
I agree with both of you @betodealmeida @villebro. I think the motivation for this SIP is super valid, the way we retrieve dashboard's data is highly inefficient. I also agree that the solution proposed will go against the principles of SIP-6 and would require a revisit of the plugin/server architecture. We are actively discussing a complete redesign of the plugin architecture of Superset. I believe this will be a main focus for us next year so my suggestion would be to wait for the redesign to avoid rework.
Argh, good point.
We're going to need that node.js
sidecar, aren't we?
Yeah, either that, or then we change the architecture to by moving query construction to the backend. However, I suspect that will be a pretty big change, so it's likely not going to happen very soon.
[SIP] A simpler chart API for dashboards
Motivation
Historically, the Apache Superset dashboard has been built on top its interactive, no-code, chart builder (”Explore”). While it makes sense to reuse the chart API endpoints for fetching the data for charts in dashboards, in order to make development faster, at the end of the day these are two different workflows: one is an interactive chart creator, the other is a mostly passive chart consumer.
This distinction has left considerable technical debt behind. In order for a dashboard to load a given chart it needs to perform a
POST
request to the the chart API, submitting a request payload that includes thequery
andform_data
needed by the endpoint. This data is redundant, since it’s already available to the backend (in fact, it needs to be fetched from the backend before thePOST
) and attached to the chart ID (which is also passed in the request). This causes the following problems:POST
request is done sending back the data that was just provided by the backend. Not only this is unnecessary, but it opens an attack vector where users can modify the form data in order to fetch unauthorized data — for example, in an embedded dashboard. While this should never be possible by design, there were multiple CVE reports exploiting this channel.POST
requests still benefit from the Superset cache (if one is configured), they won’t benefit from other caches along the way (reverse proxies, browser cache). Replacing some of these withGET
requests will provide for better dashboard performance and fewer costs, since the caching can be done upstream, as long as response headers are correctly returned by Superset.Proposed change
Currently, in order to retrieve the data needed to render a chart with ID 74, a dashboard with ID 8 will issue a request like this:
The request payload is not small:
Again: not only the payload is relatively big, but it’s also completely unnecessary, since it has to be requested from the backend before the
POST
request is made.If the dashboard has native filters in place, they will be passed as additional filters in the request payload (in bold below):
This SIP proposes that chart requests without any applied native filters are to be replaced with
GET
requests to a new API endpoint:This has multiple benefits:
Uses are still able to force-refresh the chart, in which case a simple
POST
request without a request payload is issued:When the dashboard has native filters applied to the chart, a
POST
is necessary. In that case, only the additional configuration is passed. For filters:Or a custom time grain:
The backend will then be responsible for merging the request with the saved chart configuration, in order to produce and return the requested data.
New or Changed Public Interfaces
A new API endpoint will be introduced, as described above.
New dependencies
None.
Migration Plan and Compatibility
The original endpoint should remain, both for compatibility with existing clients but mainly because it’s still used by the no-code chart builder (”Explore”). A new feature flag,
DASHBOARD_SIMPLE_CHART_API
, will be introduced, so that the feature can be tested initially until it’s mature enough to become the default workflow. Eventually, the feature flag should be removed and only the new proposed API should be used.Rejected Alternatives
This proposal has some overlap with the
CLIENT_CACHE
feature introduced by the author in Superset 0.34 and removed in 4.0.0, though it’s conceptually simpler since it doesn’t introduce an explicit Javascript caching layer.Since the proposal is mostly concerned with removing technical debt, as opposed to implementing a new feature, there were no other alternatives considered. Doing
GET
requests for stored data andPOST
ing only as much data as needed are aligned with best practices in REST APIs.We discussed moving the dashboard filters to the URL and use only
GET
requests, but since most browsers have a limit of ~2000 characters in the URL this would break if too many filters were in place.