Open choldgraf opened 1 year ago
Note that we removed the closest we had to this upstream recently as nobody found time to validate those metrics: https://github.com/jupyterhub/grafana-dashboards/pull/45
@yuvipanda are you saying that in order to implement this we will need to do our own validation?
"Upstream" is a bit blurred as usual, given I'm the person who made that PR and merged it - primarily because we couldn't find time / resources on the berkeley side to actually validate that that dashboard isn't lying.
I think steps here are to:
The upshot basically is that the query for this already exists (in that reverted PR), but definitely needs work validating it before we can actually use it.
How about we:
Maybe we can be a test-case that validates the right implementation? IMO this is worth devoting 2i2c resources towards since it solve both our problems and would be useful for the broader community
Cross-referencing https://github.com/2i2c-org/infrastructure/issues/1785#issuecomment-1308494647 which might be helpful.
@choldgraf an even more concrete proposal would be:
Once that's there, we will be able to have a query we know produces useful and correct results. Happy to support whoever wants to work on this though.
That sounds great to me - I really appreciate the step-by-step nature of this proposal!
https://github.com/berkeley-dsep-infra/datahub-usage-analysis/tree/master/notebooks has code for analysing hub logs to get accurate info on monthly active users as well! We can probably be much simpler than that - 'monthly active users' is really 'length of unique user names who have logged in in a given 30 day period', and that's easier to calculate as well.
We received a support request regarding this issue: https://2i2c.freshdesk.com/a/tickets/269
As support steward I think we should bump the priority of this one. cc @damianavila
+1 from me to bump the importance of that, it sounds like it would be useful for Toronto, Berkeley, us for internal metrics, and us because I suspect every decently-sized community will want to know the answer to this!
If we had a nice way to ingest this into notebooks we could also combine it w/ our KPIs page to do some cool stuff http://2i2c.org/kpis
OK, I have raised the priority on this one so we pay attention to it in our next planning meeting.
I spent some time today looking at that grafana dashboard, as well as doing some log analysis. I determined that the right way to do this, so we can absolutely trust our metrics, is to have JupyterHub calculate these so prometheus can store them. I've worked on and opened this PR to enable that: https://github.com/jupyterhub/jupyterhub/pull/4214.
Once that's merged and deployed, we can have a fairly simple query that'll give us daily and monthly active users across our hubs that we can trust.
this is super exciting and will be useful for many others as well.
What happened to this one? I think it is important that we start tracking this information and expose it externally, both for our own understanding of how the service is growing, and so that we can be transparent with others.
Can we please prioritize this one (again?) and decide how to expose the following pieces of information on a page at https://2i2c.org/kpis :
I think this is the minimal information that we need to define this source of truth and close https://github.com/2i2c-org/team-compass/issues/561
Note that if the fix isn't appropriate at the grafana dashboard level, it's fine to do it somewhere else, but we need to expose this information somewhere.
I opened https://github.com/2i2c-org/infrastructure/pull/2136 which will deploy https://github.com/jupyterhub/jupyterhub/pull/4214 so we can start collecting these numbers.
@choldgraf this has been deployed now!
While this will get sucked into prometheus as well, it's also publicly available if you wanna scrape it for KPIs.
As an example, if you look at m2lines.2i2c.cloud/metrics, you will see:
# HELP jupyterhub_active_users number of users who were active in the given time period
# TYPE jupyterhub_active_users gauge
jupyterhub_active_users{period="24h"} 4.0
jupyterhub_active_users{period="7d"} 9.0
jupyterhub_active_users{period="30d"} 35.0
This can be parsed out for KPI purposes perhaps. You can find a list of all clusters in https://github.com/2i2c-org/infrastructure/tree/master/config/clusters, and then look at the 'domain' property inside the 'cluster.yaml' file to get a list of all hubs (https://github.com/2i2c-org/infrastructure/blob/9516aad1104f57bb059aba7d84aef95c57ec53fc/config/clusters/2i2c-uk/cluster.yaml#L15) - maybe that's already being parsed out?
https://github.com/jupyterhub/grafana-dashboards/pull/56 adds this graph to the JupyterHub stats dashboard (not a global usage one).
@damianavila what is the PR that changed this one, or where is the dashboard? When I go to the global usage dashboard at grafana.pilot.2i2c.cloud, I still only see the one chart with the bars:
Whereas the dashboard for active users on each cluster has 3 graphs and includes them over time:
It's in the default JupyterHub dashboard, but not in the global use one. Re-opening this to add it to that dashboard as well.
Context
We track "global usage" in this grafana dashboard. Currently it consists of a single plot showing the total active users of the last 7 days for each cluster.
In a recent conversation we agreed that Monthly Active Users was one of the metrics we'd like to track to observe the growth of 2i2c's overall service operations.
Proposal
I propose that we add a plot that shows Unique users with a 30-day rolling window for each cluster, and across all clusters. This would be a timeseries plot over the last 6 months. We can use this as the source of truth for our "Monthly Active Users" KPI for now.
Alternatively we could roll our own MAU plot via the Prometheus data we can access in the instructions here:
Updates and actions