2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 64 forks source link

[Platform Initiative] Hub Scale Cost Monitoring for AWS #4384

Open Gman0909 opened 4 months ago

Gman0909 commented 4 months ago

Productboard link: https://2i2c.productboard.com/roadmap/7947557-2i2c-roadmap/features/26823459

Description

Institutional leads, as well as department leads in large organizations, need to be able to justify their budgets and ensure they are being spent with value in mind. Business intelligence depends on data, and we want to make sure we build towards a data reporting infrastructure that can make a hub or constellation of hubs' usage and cost more transparent, enabling better decision making come budget time, as well as offering a sense of security and transparency over a service that is often perceived as being a high risk for cost overruns.

To that end, we would like to give community and institutional leaders the ability to monitor the cost and usage of a hub, or groups of hubs, provided by 2i2c.

The solution should provide a dashboard that automatically updates to reflect up to date aggregated costs and usage reports for each hub in a constellation, or the single hub an administrator has admin rights over. Data should be able to be exportable in the form of reports.

Additionally, we should investigate adding an option to share the dashboard with individuals outside of those with administrative privileges.

Typical use cases:

Scope

We already have a document listing the things cloud providers charge you for. The things we care about, in priority order, are:

  1. Home directory storage (NFS)
  2. Object storage (scratch and persistent)
  3. Compute (nodes)

Each of these costs should be attributable to either:

  1. A common pool that serves all of the hubs (primarily, the core node pool and staging hubs)
  2. A particular hub

Attributing to individual users, or specific subgroups inside a hub, are out of scope.

Definition of Done

Admins of any 2i2c hub can access dashboards and reports where they can monitor up-to-date cost information for their hubs, and export reports with that same information.

### Tasks
- [ ] https://github.com/2i2c-org/infrastructure/issues/4453
- [ ] https://github.com/2i2c-org/infrastructure/issues/4872
- [ ] https://github.com/2i2c-org/infrastructure/issues/5009
- [ ] https://github.com/2i2c-org/infrastructure/issues/4451
- [ ] https://github.com/2i2c-org/infrastructure/issues/4551
yuvipanda commented 3 months ago

I've considered OpenCost, and discarded it as not being able to satisfy our needs.

jnywong commented 3 months ago

Just reading the Cloudbank ACM paper and they mentioned a closed source solution called Nutanix BEAM. Are we specifically leaning into open-source solutions here?

Gman0909 commented 3 months ago

Has anyone checked out OpenCost? It's based on Prometheus. It looks like it might have potential for dedicated clusters at least.

jnywong commented 3 months ago

@Gman0909 see yuvi's comment above 😆

Gman0909 commented 3 months ago

D'oh.

aprilmj commented 2 months ago

@Gman0909 and @haroldcampbell will have an offline conversation about how we refine the tasks associated with this (particularly how we get unblocked). There are a lot of unknowns even after spikes (see #4453).

aprilmj commented 1 month ago

Current update: we have a replicable solution for AWS; what's next? Need a definition of done & would like a showcase - James, Jenny & Jim would like to be able to use this, get feedback from Openscapes and be able to show/tell others how to use.

Note from @yuvipanda the Openscapes folks are giving feedback via an informal showcase every week, and metrics we chose were determine what features to build.

from @Gman0909: https://grafana.openscapes.2i2c.cloud/d/edw06h7udjwg0b/cloud-cost-attribution?orgId=1&from=now%2FfQ&to=now%2FfQ Dash accessible via GitHub login

aprilmj commented 1 month ago

Action to take next:

  1. @colliand and @Gman0909 and @yuvipanda - how do we formalize the things we've done with Openscapes/on this initiative so far into a repeatable process we can follow for future development of this feature and others?
  2. Post-mortem on this feature (@haroldcampbell to own making that happen)