eBay / nvidiagpubeat

nvidiagpubeat is an elastic beat that uses NVIDIA System Management Interface (nvidia-smi) to monitor NVIDIA GPU devices and can ingest metrics into Elastic search cluster, with support for both 6.x and 7.x versions of beats. nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
https://github.com/eBay/nvidiagpubeat
Apache License 2.0
54 stars 22 forks source link

FATAL [nvidiagpubeat] instance/beat.go:154 Failed due to panic. {"panic": "runtime error: index out of range", "stack": #33

Closed anaconda2196 closed 3 years ago

anaconda2196 commented 3 years ago

Can someone help in solving the issue @deepujain [nvidia-smi is installed on my machine]

2021-02-03T17:33:10.784Z INFO instance/beat.go:592 Home path: [/usr/share/nvidiagpubeat] Config path: [/usr/share/nvidiagpubeat] Data path: [/usr/share/nvidiagpubeat/data] Logs path: [/usr/share/nvidiagpubeat/logs] 2021-02-03T17:33:10.784Z INFO instance/beat.go:599 Beat UUID: 0250fd46-f397-4def-9098-ad27429e08c2 2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:825 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/nvidiagpubeat", "data": "/usr/share/nvidiagpubeat/data", "home": "/usr/share/nvidiagpubeat", "logs": "/usr/share/nvidiagpubeat/logs"}, "type": "nvidiagpubeat", "uuid": "0250fd46-f397-4def-9098-ad27429e08c2"}}} 2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:834 Build info {"system_info": {"build": {"commit": "unknown", "libbeat": "6.5.5", "time": "1754-08-30T22:43:41.128Z", "version": "6.5.5"}}} 2021-02-03T17:33:10.786Z INFO [beat] instance/beat.go:837 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":48,"version":"go1.12.5"}}} 2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:841 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2021-01-22T06:04:54Z","containerized":true,"name":"8bb58f1899fe","ip":["127.0.0.1/8","172.17.0.2/16"],"kernel_version":"3.10.0-1160.11.1.el7.x86_64","mac":["02:42:ac:11:00:02"],"os":{"family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0,"id":"d097cfbbf25e4ea2b5a1f0b530456ab7"}}} 2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:870 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":null,"effective":null,"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/nvidiagpubeat", "exe": "/usr/share/nvidiagpubeat/nvidiagpubeat", "name": "nvidiagpubeat", "pid": 1, "ppid": 0, "seccomp": {"mode":"filter","no_new_privs":false}, "start_time": "2021-02-03T17:33:09.899Z"}}} 2021-02-03T17:33:10.788Z INFO instance/beat.go:278 Setup Beat: nvidiagpubeat; Version: 6.5.5 2021-02-03T17:33:10.789Z INFO elasticsearch/client.go:163 Elasticsearch url: http://xx.xx.xx.xx:9210 2021-02-03T17:33:10.789Z INFO [publisher] pipeline/module.go:110 Beat name: 8bb58f1899fe 2021-02-03T17:33:10.789Z INFO instance/beat.go:400 nvidiagpubeat start running. 2021-02-03T17:33:10.789Z INFO beater/nvidiagpubeat.go:57 nvidiagpubeat is running for production environment. ! Hit CTRL-C to stop it. 2021-02-03T17:33:10.789Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s 2021-02-03T17:33:40.792Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":20}},"total":{"ticks":50,"time":{"ms":58},"value":50},"user":{"ticks":30,"time":{"ms":38}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":7},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30011}},"memstats":{"gc_next":4194304,"memory_alloc":2017704,"memory_total":3635056,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}} 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:152 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":22}},"total":{"ticks":50,"time":{"ms":61},"value":50},"user":{"ticks":30,"time":{"ms":39}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30019}},"memstats":{"gc_next":4194304,"memory_alloc":2521296,"memory_total":4138648,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}} 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:153 Uptime: 30.020403448s 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:130 Stopping metrics logging. 2021-02-03T17:33:40.798Z INFO runtime/panic.go:522 nvidiagpubeat stopped. 2021-02-03T17:33:40.798Z FATAL [nvidiagpubeat] instance/beat.go:154 Failed due to panic. {"panic": "runtime error: index out of range", "stack": "github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1.1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:155\nruntime.gopanic\n\t/opt/go/src/runtime/panic.go:522\nruntime.panicindex\n\t/opt/go/src/runtime/panic.go:44\ngithub.com/ebay/nvidiagpubeat/nvidia.Utilization.run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/gpu.go:123\ngithub.com/ebay/nvidiagpubeat/nvidia.Metrics.Get\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/metrics.go:50\ngithub.com/ebay/nvidiagpubeat/beater.(Nvidiagpubeat).Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/beater/nvidiagpubeat.go:73\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.(Beat).launch\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:410\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:181\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:182\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd.genRunCmd.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/run.go:37\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).ExecuteC\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738\nmain.main\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/main.go:34\nruntime.main\n\t/opt/go/src/runtime/proc.go:200"}