2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 63 forks source link

Add Monthly Active Users to our Global Usage Grafana dashboard #1888

Open choldgraf opened 1 year ago

choldgraf commented 1 year ago

Context

We track "global usage" in this grafana dashboard. Currently it consists of a single plot showing the total active users of the last 7 days for each cluster.

In a recent conversation we agreed that Monthly Active Users was one of the metrics we'd like to track to observe the growth of 2i2c's overall service operations.

Proposal

I propose that we add a plot that shows Unique users with a 30-day rolling window for each cluster, and across all clusters. This would be a timeseries plot over the last 6 months. We can use this as the source of truth for our "Monthly Active Users" KPI for now.

Alternatively we could roll our own MAU plot via the Prometheus data we can access in the instructions here:

Updates and actions

yuvipanda commented 1 year ago

Note that we removed the closest we had to this upstream recently as nobody found time to validate those metrics: https://github.com/jupyterhub/grafana-dashboards/pull/45

choldgraf commented 1 year ago

@yuvipanda are you saying that in order to implement this we will need to do our own validation?

yuvipanda commented 1 year ago

"Upstream" is a bit blurred as usual, given I'm the person who made that PR and merged it - primarily because we couldn't find time / resources on the berkeley side to actually validate that that dashboard isn't lying.

I think steps here are to:

  1. Look at the queries in that dashboard
  2. Validate them against a source of truth - which is hub logs
  3. If these look valid, revert that PR
  4. Deploy that PR here.

The upshot basically is that the query for this already exists (in that reverted PR), but definitely needs work validating it before we can actually use it.

choldgraf commented 1 year ago

How about we:

Maybe we can be a test-case that validates the right implementation? IMO this is worth devoting 2i2c resources towards since it solve both our problems and would be useful for the broader community

yuvipanda commented 1 year ago

Cross-referencing https://github.com/2i2c-org/infrastructure/issues/1785#issuecomment-1308494647 which might be helpful.

yuvipanda commented 1 year ago

@choldgraf an even more concrete proposal would be:

  1. Look at the 'monthly active users' query in https://github.com/jupyterhub/grafana-dashboards/pull/45/files#diff-1cfe44f507c8820b1182391fb764aaf537c2eaf111c2ed73f720f4cf81ed4ec9L34
  2. Run it against a couple of our hubs via the notebook referenced in earlier comment
  3. Validate that the numbers produced by that query are correct, by looking at the hub logs. Let's pick two hubs - maybe utoronto, and one more, and I'll produce the logs for those.

Once that's there, we will be able to have a query we know produces useful and correct results. Happy to support whoever wants to work on this though.

choldgraf commented 1 year ago

That sounds great to me - I really appreciate the step-by-step nature of this proposal!

yuvipanda commented 1 year ago

https://github.com/berkeley-dsep-infra/datahub-usage-analysis/tree/master/notebooks has code for analysing hub logs to get accurate info on monthly active users as well! We can probably be much simpler than that - 'monthly active users' is really 'length of unique user names who have logged in in a given 30 day period', and that's easier to calculate as well.

sgibson91 commented 1 year ago

We received a support request regarding this issue: https://2i2c.freshdesk.com/a/tickets/269

As support steward I think we should bump the priority of this one. cc @damianavila

choldgraf commented 1 year ago

+1 from me to bump the importance of that, it sounds like it would be useful for Toronto, Berkeley, us for internal metrics, and us because I suspect every decently-sized community will want to know the answer to this!

If we had a nice way to ingest this into notebooks we could also combine it w/ our KPIs page to do some cool stuff http://2i2c.org/kpis

damianavila commented 1 year ago

OK, I have raised the priority on this one so we pay attention to it in our next planning meeting.

yuvipanda commented 1 year ago

I spent some time today looking at that grafana dashboard, as well as doing some log analysis. I determined that the right way to do this, so we can absolutely trust our metrics, is to have JupyterHub calculate these so prometheus can store them. I've worked on and opened this PR to enable that: https://github.com/jupyterhub/jupyterhub/pull/4214.

Once that's merged and deployed, we can have a fairly simple query that'll give us daily and monthly active users across our hubs that we can trust.

choldgraf commented 1 year ago

this is super exciting and will be useful for many others as well.

choldgraf commented 1 year ago

What happened to this one? I think it is important that we start tracking this information and expose it externally, both for our own understanding of how the service is growing, and so that we can be transparent with others.

Can we please prioritize this one (again?) and decide how to expose the following pieces of information on a page at https://2i2c.org/kpis :

I think this is the minimal information that we need to define this source of truth and close https://github.com/2i2c-org/team-compass/issues/561

Note that if the fix isn't appropriate at the grafana dashboard level, it's fine to do it somewhere else, but we need to expose this information somewhere.

yuvipanda commented 1 year ago

I opened https://github.com/2i2c-org/infrastructure/pull/2136 which will deploy https://github.com/jupyterhub/jupyterhub/pull/4214 so we can start collecting these numbers.

choldgraf commented 1 year ago

MandalorianThisIsTheWayGIF

yuvipanda commented 1 year ago

@choldgraf this has been deployed now!

While this will get sucked into prometheus as well, it's also publicly available if you wanna scrape it for KPIs.

As an example, if you look at m2lines.2i2c.cloud/metrics, you will see:

# HELP jupyterhub_active_users number of users who were active in the given time period
# TYPE jupyterhub_active_users gauge
jupyterhub_active_users{period="24h"} 4.0
jupyterhub_active_users{period="7d"} 9.0
jupyterhub_active_users{period="30d"} 35.0

This can be parsed out for KPI purposes perhaps. You can find a list of all clusters in https://github.com/2i2c-org/infrastructure/tree/master/config/clusters, and then look at the 'domain' property inside the 'cluster.yaml' file to get a list of all hubs (https://github.com/2i2c-org/infrastructure/blob/9516aad1104f57bb059aba7d84aef95c57ec53fc/config/clusters/2i2c-uk/cluster.yaml#L15) - maybe that's already being parsed out?

yuvipanda commented 1 year ago

https://github.com/jupyterhub/grafana-dashboards/pull/56 adds this graph to the JupyterHub stats dashboard (not a global usage one).

choldgraf commented 1 year ago

@damianavila what is the PR that changed this one, or where is the dashboard? When I go to the global usage dashboard at grafana.pilot.2i2c.cloud, I still only see the one chart with the bars:

image

Whereas the dashboard for active users on each cluster has 3 graphs and includes them over time:

image

yuvipanda commented 1 year ago

It's in the default JupyterHub dashboard, but not in the global use one. Re-opening this to add it to that dashboard as well.