Closed rjurney closed 3 years ago
Issue-Label Bot is automatically applying the label #enhancement
to this issue, with a confidence of 0.94. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
Notes:
superset examples load
to superset examples import
superset examples create
to superset examples export
superset example
subcommand should support a "--repo" argumentEXAMPLES_REPO_URIS
that's a list by default should be pointing to examples-data
repoDatabase
, DruidCluster
, and .*Schedule.*
from the scope of models exported--nodata
option on both export and import--nodata
is not implemented in the PR or SIPThe other parts are addressed in the PR and SIP.
I'm closing this for a few reasons:
• Some of it (import/export, UUIDs) has already been tackled.
• It's been open for a long time, without being brought to a DISCUSS
thread
• It's pretty broad in scope, and may need to be broken down into smaller pieces if we want to carry it through.
@rjurney if you'd like to reopen this, update a little of the context, and perhaps break down the into smaller chunks for discussion/voting/implementation, just say the word, and I/we can re-open it! Thank you for all the hard work and thought that has gone into this, it'll definitely serve as a useful reference for work going forward in this area either way!
[SIP] Proposal for Improving Examples Interface, Organization and Storage
The goal of the changes in this proposal is to improve the examples capabilities of Superset so as to foster an ecosystem of examples which will sustain and grow as the platform continues to develop. First I will characterize the existing system of examples and then propose changes to improve the number and quality of examples.
Current Examples
Examples are currently programmatically defined in the
superset.data
module. An abstract interface summarizing these examples looks like the following:While this mechanism jump started the collection of Superset examples, defining examples as code will not appeal to most Superset users of growing a community that contributes examples.
Current Dashboard Import/Export
Dashboard example creation could utilize the export feature by adding example oriented fields to the export. Dashboards can be exported via the Dashboard List interface at
/dashboard/list/
via its Export action or at the command line viasuperset export_dashboards
. Dashboard and chart export JSON includes everything needed to reproduce a dashboard save the actual data table: dashboard, chart and datasource information.The
datasources. __SqlaTable__.database
element will need to be removed when examples are created and recreated when they are loaded to matchSQLALCHEMY_EXAMPLES_URI
or a—database-uri
the user specifies. Each Slice’sdatasource_id
anddatasource_name
must be changed.A Slice object has a
params_dict
which contains the following. Note that this includes references to thedatasource_name
wb_health_population
.Example Components
A Superset example is a SQL oriented dashboard and is composed of the following:
All of the above with the exception of the Datasource.Database entry will need to be serialized, stored, contributed, approved, listed, deserialized and loaded by the example system. The Datasource.Database entry will need to be removed on exporting and replaced on importing of examples.
Scope of Improvement
This proposal improves the superset example process in three areas: example creation, data storage and discoverability.
In order to improve the range and quality of Superset examples we need to first improve the process for creating and loading examples. While examples can be created programmatically, the more natural process is to use Superset to create them. This requires that we automate the process to persist and restore the combined state of the Superset Dashboard, Database and related objects as well as the contents of the datasource itself.
We also need a directory to which examples can be uploaded and a corresponding user interface and process of governance over that repository. This directory should be independent of the Superset project release process and code repository. Current processes for management of Superset’s code assets would transition directly to the management of its examples: changes would be created by creating Github issues and pull requests, data assets would be versioned and managed in a central repository.
Finally we need a user interface for finding, listing and loading examples from the repository. It should be simple and can exist as an
examples
command as part of the superset CLI which will haveexport
,list
,import
andremove
sub-commands.Example Repository Requirements
The requirement for example storage are that it have the following properties:
Git and Github are a desirable mechanism for publishing and approval but an undesirable mechanism for storage. Git LFS (Large File System) offers scalable storage while still using Github for project management. With a 2GB file limit and support on Github for 250 of these files, it scales well and is the proposed storage system. Other options are explored in the addendum.
Example File Format
Examples should be defined and packaged in a standard manner and each example should be self contained in its own file system directory. The existing
Dashboard
export format adequately describes a dashboard, it’s charts and the associated datasources but is missing human readable fields describing the contents of the dataset and dashboard as well as the physical location of the contents of the tables the datasource metadata describes. These fields will be added to the Superset dashboard export format.Data location information will be stored in a top level
files
key next to the existingdashboards
,slices
anddatasources
keys. A top leveldescription
field will fill out the fields of a description of the dataset in the examples directory. The existingWorld’s Bank Data
example is extended below:The file layout for this example appears as follows, with the dashboard slug used as the directory name in the exported tarball and examples directory:
Example Data Table Format
In order to manage tables, to create and drop them, it is helpful to assume that an
Integer
id
primary column is present. This is the case for all current example dashboard tables. In the future we may want to support tables withuuid
or other types of primary column.New or Changed Public Interfaces
Changes include the addition of a
SQLALCHEMY_EXAMPLES_URI
andEXAMPLES_GIT_TAG
configuration keys and changes to the model classes as well as the CLI.The
examples-data
RepositoryCurrently the example data is on Github at apache-superset/examples-data. This will continue to be the case, but this repository will now house both Dashboard metadata files as well as data files via Git LFS. Each example will have its own directory with its own
dashboard.json
and data files.The README.md for this repository in new new form can be accessed here: GitHub - rjurney/examples-data at lfs.
SQLALCHEMY_EXAMPLES_URI
Configuration KeyA
SQLALCHEMY_EXAMPLES_URI
configuration key insuperset/config.py
controls the default location to load examples into. This defaults to~/.superset/examples.db
and can be over-ridden on a per-import basis using the--database-uri/-d
option.EXAMPLE_REPOS_TAGS
Configuration KeyA
EXAMPLE_REPOS_TAGS
configuration key insuperset/config.py
controls the locations of the examples from which to list and load. This can be set manually using the--examples-repo/-r
option. The format of the items are a tuple containing the full repository name (ex.apache-superset/examples-data
) and the git tag/branch of the repository to use (ex.master
).In
config.py
the default entry will look like:GITHUB_AUTH_TOKEN
Configuration KeyGithub rate limits the contents API to 50 anonymous requests per hour. While this is unlikely to affect many users, the limit is by IP address which means users behind proxies or developers may sometimes encounter this. I have added the optional configuration key
GITHUB_AUTH_TOKEN
which provides a way to add a personal access token to requests from the examples sub-commands. This increases the API limit.UUIDs via sqlalchemy.types.uuid.UUIDType
In order to export or import assets in a way that doesn’t result in integer primary key chaos, we require that each serialized asset have a unique identifier. The
superset.models.helpers.ImportMixin
class has been used to provide auuid
field to the following classes:Dashboard
Datasource
Database
DruidCluster
DruidMetric
Slice
SqlMetric
SqlaTable
TableColumn
This required patching FlaskAppBuilder to support UUIDType as a String field type. This will be released with FlaskAppBuilder 2.1.4.
ImportMixin —> ImportExportMixin
I was confused by the role of
ImportMixin
in model class export, so accordingly I have renamed it toImportExportMixin
.Command Line Interface
Example capabilities will be accessed via the command line (CLI) interface. The CLI will be changed, removing the
load_examples
command and replacing it with anexamples
subcommand withexport
,list
,import
andremove
commands beneath it.Top Level CLI Menu
superset —help
Dashboard Exports Menu
The dashboard exports menu will be extended to add the
—dashboard-titles/-t
,—export-data/-x
and—export-data-dir/-d
options which facilitate example export.superset export_dashboards --help
Examples Top Level Menu
superset examples —-help
Example Creation Menu
The examples creation command can be used to export a Dashboard JSON file along with its underlying data tables into a gzipped tarball file. These assets can then be uncompressed in the
examples-data
project, committed, pushed and then submitted by pull request.superset examples export --help
Examples List Menu
The examples list command will query the
examples-data
repository and return a list of available examples along with their metadata. Examples can then be loaded from this list.superset examples list —-help
The output looks table uses
prettytable
and looks like this:Examples Import Menu
The examples import command will download example metadata and data files from the
examples-data
repository and will load them into the examples database configured via theSQLALCHEMY_EXAMPLES_URI
configuration key or using the value supplied by the—database-uri/-d
option.superset examples load --help
Examples Remove Menu
The examples remove command will remove the installed example specified from the metadata tables as well as the examples database configured via the
SQLALCHEMY_EXAMPLES_URI
configuration key or using the value supplied by the—database-uri/-d
option.superset examples remove --help
New Dependencies
Creating, removing, listing and loading examples can be handled without Git LFS but adding examples to the
examples-data
Github repository will require it. This is a developer only requirement of theexamples-data
project and not Superset itself.Git LFS can be installed via:
Migration Plan and Compatibility
Backwards compatibility will be maintained. Existing dashboard export JSON files will continue to work and all existing dashboard examples and their data files will be ported to the new system and stored in the Superset examples repository.
Cross-Repository Management
As both the examples and Superset evolve, some will work with newer versions of Superset than others. We must strive to keep all of them up to date, but should also try to make them backwards compatible. It will thus be inevitable that
incubator-superset
releases will have to point to a branch/tag ofsuperset-examples
.A given release of superset must reference a certain version of the
examples-data
repository. This is achieved via theEXAMPLES_GIT_TAG
configuration key. Alternatively, this could be a branch rather than a tag to facilitate the ongoing update of examples.