apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.59k stars 13.8k forks source link

[SIP-122] Integrated monitoring and user activity dashboard #27620

Open TechAuditBI opened 7 months ago

TechAuditBI commented 7 months ago

[SIP-122] Proposal for creating an integrated monitoring and user activity dashboard

Motivation

Probably most of companies using Superset do need to gather and analyse some data about platform's health and user activity. Currently solving this task would require a bunch of work on levels from transforming raw logs to creating a dashboard. "Big" BI tools do provide such analytics out of the box so I think we should create smth like that in Superset.

Proposed Change

The whole thing can be divided into 3 parts:

  1. DB changes Raw logs info is not suitable for building a dashboard so some sort of a datamart should be created. In case of PostgreSQL I think that it would be nice to have a materialized view with all the necessary data and a procedure to make all the transformations needed. Such a structure would provide an easy way to create an ETL to move the datamart elsewhere if needed while still being not too complicated.

  2. Dashboard itself I'm quite sure that there is somewhat about a billion variations of this dashboard created by users so we just need to get all the popular stuff together and leave some space for specifics.

  3. Deployment We all know and love example dashboards. So maybe using similar mechanism for analytics dashboard deployment can be used. It shurely should be optional and also separate from examples. Here we need to think about many little things. Maybe there should be some parameters brought out to the config file and so on. To be discussed further.

New or Changed Public Interfaces

Well, a new command like "superset load_monitoring" will definetely be needed. Otherwise maybe some new API endpoints would be nice but I'm not shure about this yet.

New dependencies

Probably not needed

Migration Plan and Compatibility

Pls help)

Rejected Alternatives

None as for now

sfirke commented 7 months ago

I agree this would be a good addition to Superset, other products have this out of the box - see the respective docs from Tableau and Metabase detailing what reports they offer, who can see them, etc.

I would suggest only Admin role users can view these reports, by default. Unless that requires RBAC to be enabled - maybe someone has a concrete suggestion for how to handle the sensitivity of this reporting.

TechAuditBI commented 7 months ago

@supersetbot orglabel

rusackas commented 7 months ago

Migration Plan and Compatibility

I don't think there's anything to migrate here, per se, but you did mention that there'll be a new command. I assume this would be run like the load examples command?

Rejected alternatives

It seems like this is querying the metadata table directly... which might have performance implications depending on the size of those tables. Perhaps we should consider/reject other sorts of ETL/pipeline as an alternative process, as opposed to materialized views? I.e. do these materialized views have a performance cost compared to other approaches?

TechAuditBI commented 6 months ago

Yes the new command probably should run like the load examples one. Data storage does require further investigation especially performance wise. Currently on our installations we use a procedure approach. Once a day during the least active period (around 3 am) the procedure runs an update script on a bunch of separate tables. This allows to avoid performance drops during peak hours. But there might be a better approach.

TechAuditBI commented 6 months ago

Also speaking of examples. Maybe it would be better to split metadata and examples in separate schemas by default? Because it makes me sick looking at all the mess in a resulting db. Yes I know that there is a config parameter that allows to use a separate db connection for examples but usually it is being ignored... So maybe we should make it a bit more structural even by default.

rusackas commented 3 months ago

Does this need to move forward as a VOTE thread? If you have time to contribute it, I'm happy to make the vote happen.

jbat commented 2 months ago

re: "about a billion variations of this dashboard" ... looks like Stephan Claus and team at HomeToGo are about to release a new version of their dashboard, V1 article here.

Is there any documentation on what is logged currently in the metadb ?

And following on, "Currently solving this task would require a bunch of work on levels from transforming raw logs", for context it would be helpful to know which log we would be referring to, at least as a starting point.

rusackas commented 1 month ago

We need to figure out what metadata databases we actually support officially if we're going to build a built-in dashboard dependent on it.

rusackas commented 2 weeks ago

@TechAuditBI any interest in continuing to move forward with this SIP? Hopefully we can get it ready for a vote soon. This would be an amazing example to have built into Superset, but it seems there's still quite a bit of detail to flesh out in order to pull this off.