Recently, on the Apache Superset Dev mailing list, a proposal was raised for lazy consensus to utilize Scarf telemetry in the download and installation of Apache Superset. This is already allowing the community to determine several useful new things:
Which versions of Superset are people installing/using now (e.g. whether people have largely made the jump from Superset 2.1.x to Superset 3.0.x)
How many people are running off of various SHAs from the repo, rather than official Apache releases (this is a work in progress, using ScarfJS in Superset’s dependencies)
The potential fallout of security issues in older installations of Superset
An approximate measure of how many people/orgs out there in the world are installing Superset, with related aggregated metadata.
The above changes have been shown at Superset Town Halls, and the documentation to configure or disable Scarf has already been added to the Superset Documentation.
Now, we’re proposing to take this telemetry to the next obvious place… Superset itself.
Proposed Change
Scarf Provides tracking pixels (essentially an HTML image tag) that you can place in your website or product to track visitors to that URL. In Superset’s case, we’ll configure the pixel as follows:
This results in an HTML tag that we can place into Superset… in fact, this is the pixel we’re proposing to add:
No PII is tracked… Scarf does not capture/retain IP information… this information is discarded by the platform upon processing/aggregating
Scarf pixels respect the Do Not Track (DNT) settings of browsers - these users will not be tracked whatsoever.
All Superset PMC members can have access to the Scarf dashboard upon request.
A Scarf pixel has already been added to the Superset documentation site. This has already proven valuable in letting us know which documentation pages are being viewed most, enabling the documentation working group to be more effective. You can see a sneak preview of that pixel’s data right here:
New or Changed Public Interfaces
The pixel itself will be added to an innocuous yet global part of the Superset UI, i.e. either the header or the footer.
We also realize that not all organizations will approve of this pixel’s presence for any number of reasons. To support this without requiring implementations to manage code changes, we’ll add a new feature flag to config.py to enable/disable this pixel (and potentially other previously implemented Scarf features). The pixel might be called ENABLE_SUPERSET_TELEMETRY or similar. The flag will be enabled by default.
New dependencies
Only an issue of billing. Scarf has a free tier, but the depth of data retention/access is limited. The higher tiers of Scarf’s billing plans allow additional access to this historical data. Preset will work with Scarf on pricing to provide sponsorship of a higher-tier account to achieve access to this data, and make it available to all PMC members.
Migration Plan and Compatibility
This will not add any breaking changes or require migrations.
Rejected Alternatives
Matomo - There is an instance of Matomo analytics managed by Apache, and available to all their projects. This shows all of the data for the Superset website (you can see it here), but there doesn’t appear to be a way to add this tracking into Superset itself via a pixel or other means.
Google Analytics - GA captures much more granular (and thus personally identifiable) information, which is not a goal of this project/integration. There are also rate limits to the free tier of GA, and while those are reasonably high (10M actions/month) the price when they’re hit is not reasonable.
Motivation
Recently, on the Apache Superset Dev mailing list, a proposal was raised for lazy consensus to utilize Scarf telemetry in the download and installation of Apache Superset. This is already allowing the community to determine several useful new things:
The above changes have been shown at Superset Town Halls, and the documentation to configure or disable Scarf has already been added to the Superset Documentation.
Now, we’re proposing to take this telemetry to the next obvious place… Superset itself.
Proposed Change
Scarf Provides tracking pixels (essentially an HTML image tag) that you can place in your website or product to track visitors to that URL. In Superset’s case, we’ll configure the pixel as follows:
This results in an HTML tag that we can place into Superset… in fact, this is the pixel we’re proposing to add:
A few key details to note about the pixel:
New or Changed Public Interfaces
The pixel itself will be added to an innocuous yet global part of the Superset UI, i.e. either the header or the footer.
We also realize that not all organizations will approve of this pixel’s presence for any number of reasons. To support this without requiring implementations to manage code changes, we’ll add a new feature flag to
config.py
to enable/disable this pixel (and potentially other previously implemented Scarf features). The pixel might be calledENABLE_SUPERSET_TELEMETRY
or similar. The flag will be enabled by default.New dependencies
Only an issue of billing. Scarf has a free tier, but the depth of data retention/access is limited. The higher tiers of Scarf’s billing plans allow additional access to this historical data. Preset will work with Scarf on pricing to provide sponsorship of a higher-tier account to achieve access to this data, and make it available to all PMC members.
Migration Plan and Compatibility
This will not add any breaking changes or require migrations.
Rejected Alternatives