NASA-IMPACT / veda-jupyterhub

VEDA JupyterHub technical planning and documentation
1 stars 1 forks source link

Setup process to clean out stale / unused User home directories #15

Open batpad opened 7 months ago

batpad commented 7 months ago

Problem: Currently, users on the VEDA Hub get persistent home directories when they login and can use that to store data that is persisted on network file storage. Especially for users that might only use the Hub for a single workshop and not for long-running workflows, we want a good way to do house-keeping and regularly look out for and clean out home directories that are taking up space and are clearly stale and unused.

As a first pass, we probably don't want to fully automate this, but have a good process + interface + tooling for admin users to regularly review user home directories and clear out things that seem unused / are just taking up space.

Goals for this ticket:

@yuvipanda will be great if you can, whenever, outline details for the above and if we need to build additional functionality or if this is mostly documentation.

In the future, we may want to automate parts of this clean-up, but for now let's focus on making it as easy as possible for an admin user to manually review and cleanup unused resources.

cc @wildintellect

wildintellect commented 7 months ago

This could also be connected to #8 where each tier gets a different grace period of their storage.

yuvipanda commented 7 months ago

Here's the pathway here:

  1. Make a grafana dashboard that lists users home directories, the last time they were modified, how big they are, etc as a table. This data is already collected by https://github.com/yuvipanda/prometheus-dirsize-exporter. I'll provide the JSON for the grafana dashboard below. It's a 'dirty' export from one particular grafana, and would need to be made as a PR to https://github.com/jupyterhub/grafana-dashboards. This would allow it to show up in the appropriate grafana for the hubs (VEDA, GHG, etc)
  2. Enable the allusers directory. This puts an allusers directory in the home directory of admin users. Perhaps it can be put somewhere other than $HOME as well, so people don't accidentally delete everyone's home directories? Regardless, this would give access to admins to be able to go cleanup people's home directories. I think this is a good time for you to try make a PR to the infrastructure repository following that link, @batpad - and we can work through any issues there.
  3. Most importantly, now there needs to be an actual policy here. And this needs to be communicated to the users. What actually happens? Do their files get deleted? Are they notified? If so, how? This is actually the hard part.

Here's the JSON for the dashboard. You can test it out importing it directly into any of the current hubs' grafanas, but that'll be temporary until a PR gets made.

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 8,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "B5M_zxhnz"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "custom": {
            "align": "auto",
            "cellOptions": {
              "type": "auto"
            },
            "inspect": false
          },
          "mappings": [],
          "thresholds": {
            "mode": "percentage",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "Size"
            },
            "properties": [
              {
                "id": "unit",
                "value": "bytes"
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Last Modified"
            },
            "properties": [
              {
                "id": "unit",
                "value": "dateTimeFromNow"
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "Number of Files"
            },
            "properties": [
              {
                "id": "unit",
                "value": "short"
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "% of total space usage"
            },
            "properties": [
              {
                "id": "unit",
                "value": "percentunit"
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 23,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "options": {
        "cellHeight": "sm",
        "footer": {
          "countRows": false,
          "fields": "",
          "reducer": [
            "sum"
          ],
          "show": false
        },
        "showHeader": true,
        "sortBy": [
          {
            "desc": true,
            "displayName": "Size"
          }
        ]
      },
      "pluginVersion": "10.1.5",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "B5M_zxhnz"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "min(dirsize_latest_mtime) by (directory) * 1000",
          "format": "table",
          "instant": true,
          "legendFormat": "Last Modified",
          "range": false,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "B5M_zxhnz"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "max(dirsize_total_size_bytes) by (directory)",
          "format": "table",
          "hide": false,
          "instant": true,
          "legendFormat": "Total Size",
          "range": false,
          "refId": "B"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "B5M_zxhnz"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "max(dirsize_total_size_bytes) by (directory) / ignoring (directory) group_left sum(dirsize_total_size_bytes) ",
          "format": "table",
          "hide": false,
          "instant": true,
          "legendFormat": "% of total space used",
          "range": false,
          "refId": "D"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "B5M_zxhnz"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "max(dirsize_entries_count) by (directory)",
          "format": "table",
          "hide": false,
          "instant": true,
          "legendFormat": "Items Count",
          "range": false,
          "refId": "C"
        }
      ],
      "title": "User Home Directories Report",
      "transformations": [
        {
          "id": "joinByField",
          "options": {
            "byField": "directory",
            "mode": "outer"
          }
        },
        {
          "id": "organize",
          "options": {
            "excludeByName": {
              "Time 1": true,
              "Time 2": true,
              "Time 3": true,
              "Time 4": true
            },
            "indexByName": {},
            "renameByName": {
              "Time 1": "",
              "Value #A": "Last Modified",
              "Value #B": "Size",
              "Value #C": "Number of Files",
              "Value #D": "% of total space usage"
            }
          }
        }
      ],
      "type": "table"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "B5M_zxhnz"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green"
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 23
      },
      "id": 3,
      "options": {
        "colorMode": "value",
        "graphMode": "area",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "textMode": "auto"
      },
      "pluginVersion": "10.1.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "B5M_zxhnz"
          },
          "editorMode": "code",
          "exemplar": false,
          "expr": "sum(dirsize_total_size_bytes)",
          "instant": false,
          "legendFormat": "__auto",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Total size of home directories",
      "type": "stat"
    }
  ],
  "refresh": "",
  "schemaVersion": 38,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Home Directory Usage Dashboard",
  "uid": "bd232539-52d0-4435-8a62-fe637dc822be",
  "version": 7,
  "weekStart": ""
}
batpad commented 5 months ago

Thanks @yuvipanda for running me through loading this up in Grafana, etc. Was useful to look at the dashboard for the VEDA and GHG hubs. We identified a couple of HOME directories that had > 50GB that had not been touched in a while and have reached out to users to delete things they are not using.

Next actions would be figuring out how to add the JSON above as jsonnet and make a PR to https://github.com/jupyterhub/grafana-dashboards/tree/main/dashboards to add the User Home Directory dashboard as part of the default dashboards for a hub on Grafana.

@sunu not sure if you have experience with configuring Grafana in this way? Let's take a look next week and we can probably reach out to @yuvipanda for guidance.

batpad commented 4 months ago

@yuvipanda - sorry, tried to look a bit at the .jsonnet files like https://github.com/jupyterhub/grafana-dashboards/blob/main/dashboards/usage-report.jsonnet and your JSON above, but I don't think I quite understand what exactly needs to be done to make the PR to add it to the dashboard / how that JSON should be translated into jsonnet constructs or so.

Happy for any tips or where to look to be able to push this through - thank you!