nvidiagpubeat is an elastic beat that uses NVIDIA System Management Interface (nvidia-smi) to monitor NVIDIA GPU devices and can ingest metrics into Elastic search cluster, with support for both 6.x and 7.x versions of beats. nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
Can someone help in solving the issue @deepujain
[nvidia-smi is installed on my machine]
2021-02-03T17:33:10.784Z INFO instance/beat.go:592 Home path: [/usr/share/nvidiagpubeat] Config path: [/usr/share/nvidiagpubeat] Data path: [/usr/share/nvidiagpubeat/data] Logs path: [/usr/share/nvidiagpubeat/logs]
2021-02-03T17:33:10.784Z INFO instance/beat.go:599 Beat UUID: 0250fd46-f397-4def-9098-ad27429e08c2
2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:825 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/nvidiagpubeat", "data": "/usr/share/nvidiagpubeat/data", "home": "/usr/share/nvidiagpubeat", "logs": "/usr/share/nvidiagpubeat/logs"}, "type": "nvidiagpubeat", "uuid": "0250fd46-f397-4def-9098-ad27429e08c2"}}}
2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:834 Build info {"system_info": {"build": {"commit": "unknown", "libbeat": "6.5.5", "time": "1754-08-30T22:43:41.128Z", "version": "6.5.5"}}}
2021-02-03T17:33:10.786Z INFO [beat] instance/beat.go:837 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":48,"version":"go1.12.5"}}}
2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:841 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2021-01-22T06:04:54Z","containerized":true,"name":"8bb58f1899fe","ip":["127.0.0.1/8","172.17.0.2/16"],"kernel_version":"3.10.0-1160.11.1.el7.x86_64","mac":["02:42:ac:11:00:02"],"os":{"family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0,"id":"d097cfbbf25e4ea2b5a1f0b530456ab7"}}}
2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:870 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":null,"effective":null,"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/nvidiagpubeat", "exe": "/usr/share/nvidiagpubeat/nvidiagpubeat", "name": "nvidiagpubeat", "pid": 1, "ppid": 0, "seccomp": {"mode":"filter","no_new_privs":false}, "start_time": "2021-02-03T17:33:09.899Z"}}}
2021-02-03T17:33:10.788Z INFO instance/beat.go:278 Setup Beat: nvidiagpubeat; Version: 6.5.5
2021-02-03T17:33:10.789Z INFO elasticsearch/client.go:163 Elasticsearch url: http://xx.xx.xx.xx:9210
2021-02-03T17:33:10.789Z INFO [publisher] pipeline/module.go:110 Beat name: 8bb58f1899fe
2021-02-03T17:33:10.789Z INFO instance/beat.go:400 nvidiagpubeat start running.
2021-02-03T17:33:10.789Z INFO beater/nvidiagpubeat.go:57 nvidiagpubeat is running for production environment. ! Hit CTRL-C to stop it.
2021-02-03T17:33:10.789Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s
2021-02-03T17:33:40.792Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":20}},"total":{"ticks":50,"time":{"ms":58},"value":50},"user":{"ticks":30,"time":{"ms":38}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":7},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30011}},"memstats":{"gc_next":4194304,"memory_alloc":2017704,"memory_total":3635056,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}}
2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:152 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":22}},"total":{"ticks":50,"time":{"ms":61},"value":50},"user":{"ticks":30,"time":{"ms":39}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30019}},"memstats":{"gc_next":4194304,"memory_alloc":2521296,"memory_total":4138648,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}}
2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:153 Uptime: 30.020403448s
2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:130 Stopping metrics logging.
2021-02-03T17:33:40.798Z INFO runtime/panic.go:522 nvidiagpubeat stopped.
2021-02-03T17:33:40.798Z FATAL [nvidiagpubeat] instance/beat.go:154 Failed due to panic. {"panic": "runtime error: index out of range", "stack": "github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1.1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:155\nruntime.gopanic\n\t/opt/go/src/runtime/panic.go:522\nruntime.panicindex\n\t/opt/go/src/runtime/panic.go:44\ngithub.com/ebay/nvidiagpubeat/nvidia.Utilization.run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/gpu.go:123\ngithub.com/ebay/nvidiagpubeat/nvidia.Metrics.Get\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/metrics.go:50\ngithub.com/ebay/nvidiagpubeat/beater.(Nvidiagpubeat).Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/beater/nvidiagpubeat.go:73\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.(Beat).launch\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:410\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:181\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:182\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd.genRunCmd.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/run.go:37\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).ExecuteC\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738\nmain.main\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/main.go:34\nruntime.main\n\t/opt/go/src/runtime/proc.go:200"}
Can someone help in solving the issue @deepujain [nvidia-smi is installed on my machine]
2021-02-03T17:33:10.784Z INFO instance/beat.go:592 Home path: [/usr/share/nvidiagpubeat] Config path: [/usr/share/nvidiagpubeat] Data path: [/usr/share/nvidiagpubeat/data] Logs path: [/usr/share/nvidiagpubeat/logs] 2021-02-03T17:33:10.784Z INFO instance/beat.go:599 Beat UUID: 0250fd46-f397-4def-9098-ad27429e08c2 2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:825 Beat info {"system_info": {"beat": {"path": {"config": "/usr/share/nvidiagpubeat", "data": "/usr/share/nvidiagpubeat/data", "home": "/usr/share/nvidiagpubeat", "logs": "/usr/share/nvidiagpubeat/logs"}, "type": "nvidiagpubeat", "uuid": "0250fd46-f397-4def-9098-ad27429e08c2"}}} 2021-02-03T17:33:10.784Z INFO [beat] instance/beat.go:834 Build info {"system_info": {"build": {"commit": "unknown", "libbeat": "6.5.5", "time": "1754-08-30T22:43:41.128Z", "version": "6.5.5"}}} 2021-02-03T17:33:10.786Z INFO [beat] instance/beat.go:837 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":48,"version":"go1.12.5"}}} 2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:841 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2021-01-22T06:04:54Z","containerized":true,"name":"8bb58f1899fe","ip":["127.0.0.1/8","172.17.0.2/16"],"kernel_version":"3.10.0-1160.11.1.el7.x86_64","mac":["02:42:ac:11:00:02"],"os":{"family":"redhat","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":9,"patch":2009,"codename":"Core"},"timezone":"UTC","timezone_offset_sec":0,"id":"d097cfbbf25e4ea2b5a1f0b530456ab7"}}} 2021-02-03T17:33:10.788Z INFO [beat] instance/beat.go:870 Process info {"system_info": {"process": {"capabilities": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitted":null,"effective":null,"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/nvidiagpubeat", "exe": "/usr/share/nvidiagpubeat/nvidiagpubeat", "name": "nvidiagpubeat", "pid": 1, "ppid": 0, "seccomp": {"mode":"filter","no_new_privs":false}, "start_time": "2021-02-03T17:33:09.899Z"}}} 2021-02-03T17:33:10.788Z INFO instance/beat.go:278 Setup Beat: nvidiagpubeat; Version: 6.5.5 2021-02-03T17:33:10.789Z INFO elasticsearch/client.go:163 Elasticsearch url: http://xx.xx.xx.xx:9210 2021-02-03T17:33:10.789Z INFO [publisher] pipeline/module.go:110 Beat name: 8bb58f1899fe 2021-02-03T17:33:10.789Z INFO instance/beat.go:400 nvidiagpubeat start running. 2021-02-03T17:33:10.789Z INFO beater/nvidiagpubeat.go:57 nvidiagpubeat is running for production environment. ! Hit CTRL-C to stop it. 2021-02-03T17:33:10.789Z INFO [monitoring] log/log.go:117 Starting metrics logging every 30s 2021-02-03T17:33:40.792Z INFO [monitoring] log/log.go:144 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":20}},"total":{"ticks":50,"time":{"ms":58},"value":50},"user":{"ticks":30,"time":{"ms":38}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":7},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30011}},"memstats":{"gc_next":4194304,"memory_alloc":2017704,"memory_total":3635056,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}} 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:152 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":22}},"total":{"ticks":50,"time":{"ms":61},"value":50},"user":{"ticks":30,"time":{"ms":39}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"391584c9-bc61-4868-a440-9886fba4a756","uptime":{"ms":30019}},"memstats":{"gc_next":4194304,"memory_alloc":2521296,"memory_total":4138648,"rss":14045184}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":1,"events":{"active":0}}},"system":{"cpu":{"cores":48},"load":{"1":0.21,"15":0.25,"5":0.24,"norm":{"1":0.0044,"15":0.0052,"5":0.005}}}}}} 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:153 Uptime: 30.020403448s 2021-02-03T17:33:40.798Z INFO [monitoring] log/log.go:130 Stopping metrics logging. 2021-02-03T17:33:40.798Z INFO runtime/panic.go:522 nvidiagpubeat stopped. 2021-02-03T17:33:40.798Z FATAL [nvidiagpubeat] instance/beat.go:154 Failed due to panic. {"panic": "runtime error: index out of range", "stack": "github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1.1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:155\nruntime.gopanic\n\t/opt/go/src/runtime/panic.go:522\nruntime.panicindex\n\t/opt/go/src/runtime/panic.go:44\ngithub.com/ebay/nvidiagpubeat/nvidia.Utilization.run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/gpu.go:123\ngithub.com/ebay/nvidiagpubeat/nvidia.Metrics.Get\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidia/metrics.go:50\ngithub.com/ebay/nvidiagpubeat/beater.(Nvidiagpubeat).Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/beater/nvidiagpubeat.go:73\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.(Beat).launch\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:410\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:181\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance.Run\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/instance/beat.go:182\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd.genRunCmd.func1\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/libbeat/cmd/run.go:37\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:704\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(Command).ExecuteC\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:785\ngithub.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/vendor/github.com/elastic/beats/vendor/github.com/spf13/cobra/command.go:738\nmain.main\n\t/usr/build/nvidiagpubuild/beats_dev/src/github.com/ebay/nvidiagpubeat/main.go:34\nruntime.main\n\t/opt/go/src/runtime/proc.go:200"}