NordicHPC / slurm2sql

Dump slurm accounting database to sqlite3 database for easy analysis
MIT License
9 stars 6 forks source link

Read a Slurm accounting database to a sqlite3 file

This contains a utility, slurm2sql, which uses the Slurm <https://slurm.schedmd.com/overview>__ workload manager's sacct, to export statistics from jobs and load them to a well-formed SQLite3 file (the database is also made so that it can be quried with DuckDB). This file can then be queried for analytics much more easily than the raw database or your own exports. The main features are:

Even if SQLite isn't what you need, it provides an easy intermediate file on the way to convert to whatever format you want. In particular, it defines the database so that it can be used with DuckDB, which is a more efficient tool for analytics.

There are also some command line frontends, slurm2sql-sacct and slurm2sql-seff that use this parsing to print out data in better forms than built-in Slurm commands. This is especially useful for sacct. You can design your own tools like this.

Installation

Normal pip installation, name slurm2sql for the command line programs. This installs the library and command line programs.

::

pip install slurm2sql

There is only a single file with no depencecies for the core slurm2sql library (which could also be manually downloaded - HPC, right?), though the command line programs require tabulate. It's made to support very old Python.

Usage

slurm2sql


Sample usage::

  slurm2sql.py OUTPUT_DB -- [SACCT_FILTER_OPTIONS]

For example, to get all data from July and August (``-S``) for all
users (``-a``)::

  slurm2sql.py sincejuly.sqlite3 -- -S 2019-07-1 -a

To get the data from the last *N* days.  This will, day by day, get
each of these history and cumulatively update the database.  This
updates a database by default, so that it can be used every day in
order to efficiently keep a running database.  The ``-u`` option means
"don't delete existing database" (jobs with the same JobID get
updated, not duplicated)::

  slurm2sql.py --history-days=N -u sincejuly.sqlite3 -- -a

The ``--history-start=YYYY-MM-DD`` option can do a similar thing
starting from a certain day, and ``--history=DD-HH:MM:SS`` starts
collecting from a given interval of time ago (the time format is as in
Slurm).

To resume from where you left off, first run with one of the history
options.  Then, you can do ``--history-resume`` (no ``-u`` needed) and
it will continue fetching day-by-day from the time you last fetched.
You can also run this every day, to first load old historykeep a database updated::

  slurm2sql.py --history-days=N -u sincejuly.sqlite3 -- -a
  slurm2sql.py --history-resume sincejuly.sqlite3 -- -a

``slurm2sql-sacct``

This probably isn't the most useful part. Look at command line options.

.. code-block:: console

$ slurm2sql-sacct SACCT_FILTER

slurm2sql-seff


This is more useful: it prints ``seff`` like output in a tabular
format.  MemReqGiB is per-node, to compare withMaxRSSGiB.

.. code-block:: console

   $ slurm2sql-sacct SACCT_FILTER

.. code-block:: console

   $ slurm2sql-seff -S now-3day
     JobID User    hours NCPUS CPUeff MemReqGiB MaxRSSGiB MemEff NGpus GPUeff read_MiBps write_MiBps
   ------- ------- ----- ----- ------ --------- --------- ------ ----- ------ ---------- -----------
   1860854 darstr1  0.28     1    87%     50         9.76    20%                  213.88       14.51
   1877467 darstr1  0        0     0%      0                  0%
   1884493 darstr1  0        1     0%      0.49      0        0%
   1884494 darstr1  0        1     0%      0.49      0        0%

From Python

It can also be used from Python as what is essentially a glorified parser.

.. code-block:: python

db = sqlite3.connect(':memory:') slurm2sql.slurm2sql(db, ['-S', '2019-08-26'])

For example, you can then convert to a dataframe:

import pandas as pd df = pd.read_sql('SELECT * FROM slurm', db)

From DuckDB


DuckDB is a lot like SQLite, but column-oriented and optimized for
fast processing of data.  The main downsides are slow inserts and
columns must have consistent data types, but that's the tradeoff we
need.  Slurm2sql's SQLite database is created with type definitions,
so that you can easily open it with DuckDB even without conversion:

.. code-block:: console

   $ duckdb dump.sqlite3

Or for even more speed, make a temporary in-memory copy (or this could
also be made into a file):

.. code-block:: sql

   -- command line:  $ duckdb database.db
   ATTACH ':memory:' AS tmp;
   CREATE TABLE tmp.slurm AS (SELECT * FROM slurm);
   USE tmp;      -- optional but makes tmp the default

Converting to DuckDB:

.. code-block:: console

    $ duckdb new.duckdb "CREATE TABLE slurm AS (SELECT * FROM sqlite_scan('original.sqlite3', 'slurm'))"

Using via DuckDB from Python (with the raw sqlite database):

.. code-block:: python

    conn = duckdb.connect("database.sqlite3")
    conn.execute("select avg(cputime) from slurm").df()

Database format
---------------

Tables and views:

* Table ``slurm``: the main table with all of the data.  There is one
  row for each item returned by ``sacct``.
* View ``allocations``: has only the jobs (not job steps) (``where
  JobStep is null``).
* View ``eff``: Does a lot of processing of ``slurm`` to produce some
  ``CPUEff``, ``MemEff``, and ``GPUeff`` values (0.0-1.0 usage
  fractions), in addition to a bit more.

In general, there is one column for each item returned by ``sacct``,
but some of them are converted into a more useful form.  Some columns
are added by re-processing other columns.  See ``COLUMNS`` in
``slurm2sql.py`` for details.  Extra columns can easily be added.

Developer note: There are two types of converter functions to make the
columns: easy ones, which map one slurm column directly to a database
column via a function, and line functions, which take the whole row
and can do arbitrary remixing of the data (to compute things like CpuEff.

Columns

All column values are converted to standard units: bytes (not MB, KB, etc), seconds, fraction 0.0-1.0 for things like percentages, and unixtime.

Columns which are the same in raw sacct output aren't documented specifically here (but note the default units above).

Below are some notable columns which do not exist in sacct (for the rest, check out the sacct manual page <https://slurm.schedmd.com/sacct.html#lbAF>_). It's good to verify that any of our custom columns make sense before trusting them. For other columns, check man sacct.

Quick reference of the other most important columns from the accounting database that are hardest to remember:

The eff table adds the following:

Changelog

Next

0.9.1

Development and maintenance

This could be considered beta right now, but it works and is in use by people. If this is important for you, comment about your use case in the Github issue tracker. Watch the repo if you want to give comments on future data schema updates and backwards compatibility when Slurm changes.

There are many different variations of Slurm, if it doesn't work for you, send an issue or pull request to help us make it more general - development is only done in response to feedback.

Development principles:

Release process::

python setup.py sdist bdist_wheel twine upload [--repository-url https://test.pypi.org/legacy/] dist/0.9.0

Originally developed at Aalto University, Finland.