2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 62 forks source link

[Initiative] Hub Scale Cost Monitoring #4384

Open Gman0909 opened 2 months ago

Gman0909 commented 2 months ago

Productboard link: https://2i2c.productboard.com/roadmap/7947557-2i2c-roadmap/features/26823459

Description

Institutional leads, as well as department leads in large organizations, need to be able to justify their budgets and ensure they are being spent with value in mind. Business intelligence depends on data, and we want to make sure we build towards a data reporting infrastructure that can make a hub or constellation of hubs' usage and cost more transparent, enabling better decision making come budget time, as well as offering a sense of security and transparency over a service that is often perceived as being a high risk for cost overruns.

To that end, we would like to give community and institutional leaders the ability to monitor the cost and usage of a hub, or groups of hubs, provided by 2i2c.

The solution should provide a dashboard that automatically updates to reflect up to date aggregated costs and usage reports for each hub in a constellation, or the single hub an administrator has admin rights over. Data should be able to be exportable in the form of reports.

Additionally, we should investigate adding an option to share the dashboard with individuals outside of those with administrative privileges.

Typical use cases:

Scope

We already have a document listing the things cloud providers charge you for. The things we care about, in priority order, are:

  1. Home directory storage (NFS)
  2. Object storage (scratch and persistent)
  3. Compute (nodes)

Each of these costs should be attributable to either:

  1. A common pool that serves all of the hubs (primarily, the core node pool and staging hubs)
  2. A particular hub

Attributing to individual users, or specific subgroups inside a hub, are out of scope.

### Tasks
- [ ] https://github.com/2i2c-org/infrastructure/issues/4451
- [ ] https://github.com/2i2c-org/infrastructure/issues/4453
- [ ] https://github.com/2i2c-org/infrastructure/issues/4474
- [ ] [EPIC] Support attributing costs to individual hubs automatically on Azure
- [ ] https://github.com/2i2c-org/infrastructure/issues/4551
- [ ] https://github.com/2i2c-org/meta/issues/1467
yuvipanda commented 1 month ago

I've considered OpenCost, and discarded it as not being able to satisfy our needs.

jnywong commented 1 month ago

Just reading the Cloudbank ACM paper and they mentioned a closed source solution called Nutanix BEAM. Are we specifically leaning into open-source solutions here?

Gman0909 commented 1 month ago

Has anyone checked out OpenCost? It's based on Prometheus. It looks like it might have potential for dedicated clusters at least.

jnywong commented 1 month ago

@Gman0909 see yuvi's comment above 😆

Gman0909 commented 1 month ago

D'oh.

aprilmj commented 5 days ago

@Gman0909 and @haroldcampbell will have an offline conversation about how we refine the tasks associated with this (particularly how we get unblocked). There are a lot of unknowns even after spikes (see #4453).