PrincetonUniversity / jobstats

GNU General Public License v2.0
42 stars 7 forks source link

Queries against cgroup_exporter metrics are in a different format than jobstats expects #4

Closed pmarr closed 9 months ago

pmarr commented 9 months ago

We're trying to implement jobstats at LSU HPC after seeing hearing about it at PEARC 2023.

With a working Slurm cluster and Prometheus collecting the cgroup_exporter metrics by plazonic we get the following error when querying a running job with a sufficient amount of sampling time:

./jobstats -c mike -p http://prometheus 132772 -d
DEBUG: jobidraw=132772, start=1694780009, end=Unknown, cluster=mike, tres=billing=64,cpu=64,mem=250G,node=1, data=, user=test123, account=test123_acct, state=RUNNING, timelimit=720, nodes=1, ncpus=64, reqmem=250G, qos=normal, partition=single, jobname=export_test
DEBUG: jobid=132772, jobidraw=132772, start=1694780009, end=1694796200.1725707, gpus=0, diff=16191.172570705414, cluster=mike, data=, timelimitraw=720
DEBUG: query=max_over_time(cgroup_memory_total_bytes{cluster='mike',jobid='132772',step='',task=''}[16191s]), time=1694796200.1725707
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_memory_rss_bytes{cluster='mike',jobid='132772',step='',task=''}[16191s]), time=1694796200.1725707
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_cpu_total_seconds{cluster='mike',jobid='132772',step='',task=''}[16191s]), time=1694796200.1725707
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_cpus{cluster='mike',jobid='132772',step='',task=''}[16191s]), time=1694796200.1725707
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
No stats found for job 132772, either because it is too old or because
it expired from jobstats database. If you are not running this command on the
cluster where the job was run then use the -c option to specify the cluster.
If the run time was very short then try running "seff 132772".

The jobstats response j in function get_data() successfully queries Prometheus server but the results value sent to get_data_out is empty:

{'status': 'success', 'data': {'resultType': 'vector', 'result': []}}

Looking at the cgroup_exporter we see exports in the format:

cgroup_memory_total_bytes{cgroup="/slurm/uid_11111/job_132772", instance="mike030:9306", job="mike_nodes"

But jobstats expects:

cgroup_memory_total_bytes{cluster='mike',jobid='132772',step='',task=''}[9303s])

We can confirm metrics are actually being collected for cgroups. Running the query in Prometheus' graph manually in the cgroup_exporter format correctly returns a result for total memory.

Are we using the correct cgroup exporter and/or is there something else we are missing?

Thanks in advance.

plazonic commented 9 months ago

You should indeed be using https://github.com/plazonic/cgroup_exporter - that one has a bit of code to extract jobid out of cgroup path and present them as jobid label instead. The fact that you are seeing cgroup="/slurm/uid..." makes me think that you do not have the correct one - if you run cgroup_exporter do you get --collect.fullslurm option?

pmarr commented 9 months ago

I do not see an option for --collect.fullslurm . The releases link in the README.md points to https://github.com/treydock/cgroup_exporter/releases/tag/v0.9.1 which I see now does not contain the CollectFullSlurm option. We pulled the version by treydock instead of from the plazonic repo. I will build the correct binary and report back. User error.

./cgroup_exporter --help
usage: cgroup_exporter --config.paths=CONFIG.PATHS [<flags>]

Flags:
  -h, --[no-]help               Show context-sensitive help (also try --help-long and --help-man).
      --config.paths=CONFIG.PATHS
                                Comma separated list of cgroup paths to check, eg /user.slice,/system.slice,/slurm
      --web.listen-address=":9306"
                                Address to listen on for web interface and telemetry.
      --[no-]web.disable-exporter-metrics
                                Exclude metrics about the exporter (promhttp_*, process_*, go_*)
      --path.cgroup.root="/sys/fs/cgroup"
                                Root path to cgroup fs
      --path.proc.root="/proc"  Root path to proc fs
      --[no-]collect.proc       Boolean that sets if to collect proc information
      --collect.proc.max-exec=100
                                Max length of process executable to record
      --log.level=info          Only log messages with the given severity or above. One of: [debug, info, warn, error]
      --log.format=logfmt       Output format of log messages. One of: [logfmt, json]
      --[no-]version            Show application version.

./cgroup_exporter --version
cgroup_exporter, version 0.9.1 (branch: HEAD, revision: a0e27e49d10f2c5684e189eac9ba340fe3ff34d7)
  build user:       root@5bb7c17432fc
  build date:       20230512-20:01:34
  go version:       go1.20.4
  platform:         linux/amd64
  tags:             netgo
pmarr commented 9 months ago

Something for me to investigate next week, but we're still sending an empty result to get_data_out().

Confirmed to use the plazonic repo cgroup_exporter.

cgroup_exporter --version
cgroup_exporter, version 0.7.0 (branch: master, revision: 8d7a698923969056dfd96bcf66b4c8e1af9e0b6d)
  build user:       root@mike001
  build date:       20230915-19:18:16
  go version:       go1.18.4
  platform:         linux/amd64

./jobstats -c mike -p http://prometheus 133329 -d

DEBUG: jobidraw=133329, start=1694808532, end=Unknown, cluster=mike, tres=billing=64,cpu=16,gres/gpu=1,mem=125G,node=1, data=, user=lsepcl1, account=test123_acct, state=RUNNING, timelimit=150, nodes=1, ncpus=16, reqmem=125G, qos=normal, partition=gpu, jobname=bash
DEBUG: jobid=133329, jobidraw=133329, start=1694808532, end=1694813817.759398, gpus=1, diff=5285.759397983551, cluster=mike, data=, timelimitraw=150
DEBUG: query=max_over_time(cgroup_memory_total_bytes{cluster='mike',jobid='133329',step='',task=''}[5285s]), time=1694813817.759398
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}

Prometheus graphing uses:

cgroup_memory_total_bytes{instance="mike177:9306", job="mike_nodes", jobid="133329", step="0", task="0"}

But the queries we're sending above use:

cgroup_memory_total_bytes{cluster='mike',jobid='133329',step='',task=''}

The full cgroup_exporter commandline:

cgroup_exporter --path.cgroup.root=/sys/fs/cgroup --config.paths=/user.slice,/slurm --collect.fullslurm

plazonic commented 9 months ago

It seems to me that you didn't add cluster label at scrape time, as part of the Prometheus config.

Josko

On Fri, Sep 15, 2023, 17:51 pmarr @.***> wrote:

Something for me to investigate next week, but we're still sending an empty result to get_data_out().

Confirmed to use the plazonic repo cgroup_exporter.

cgroup_exporter --version cgroup_exporter, version 0.7.0 (branch: master, revision: 8d7a698923969056dfd96bcf66b4c8e1af9e0b6d) build user: @.*** build date: 20230915-19:18:16 go version: go1.18.4 platform: linux/amd64

./jobstats -c mike -p http://prometheus 133329 -d

DEBUG: jobidraw=133329, start=1694808532, end=Unknown, cluster=mike, tres=billing=64,cpu=16,gres/gpu=1,mem=125G,node=1, data=, user=lsepcl1, account=test123_acct, state=RUNNING, timelimit=150, nodes=1, ncpus=16, reqmem=125G, qos=normal, partition=gpu, jobname=bash DEBUG: jobid=133329, jobidraw=133329, start=1694808532, end=1694813817.759398, gpus=1, diff=5285.759397983551, cluster=mike, data=, timelimitraw=150 DEBUG: query=max_over_time(cgroup_memory_total_bytes{cluster='mike',jobid='133329',step='',task=''}[5285s]), time=1694813817.759398 DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}

Prometheus graphing uses:

cgroup_memory_total_bytes{instance="mike177:9306", job="mike_nodes", jobid="133329", step="0", task="0"}

But the queries we're sending above use:

cgroup_memory_total_bytes{cluster='mike',jobid='133329',step='',task=''}

The full cgroup_exporter commandline:

cgroup_exporter --path.cgroup.root=/sys/fs/cgroup --config.paths=/user.slice,/slurm --collect.fullslurm

— Reply to this email directly, view it on GitHub https://github.com/PrincetonUniversity/jobstats/issues/4#issuecomment-1721898824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAISM5JDA3UVSA2JNZZT7GLX2TEULANCNFSM6AAAAAA42CUA74 . You are receiving this because you commented.Message ID: @.***>

pmarr commented 9 months ago

You are right. I checked the Prometheus service discovery and did not see a cluster= in the target labels. Corrected the syntax to apply the label and jobstats now successfully reports on jobs.

Apologies for not catching that on our end. Thank you for the help debugging.

I would suggest changing the README.md for "Download the latest release" to not point to the treydock version of cgroup_exporter as that caused us some confusion as seen above.

Great work on the jobstats utility!