Closed NAshwinKumar closed 5 years ago
ls /dev | grep nvidia | grep -v nvidia-uvm | grep -v nvidiactl | wc -l
nvidiagpubeat --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv
1) Full stack trace
2019-09-04T21:08:38.188+0530 INFO instance/beat.go:607 Home path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat] Config path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat] Data path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data] Logs path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/logs]
2019-09-04T21:08:38.188+0530 DEBUG [beat] instance/beat.go:659 Beat metadata path: /home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data/meta.json
2019-09-04T21:08:38.188+0530 INFO instance/beat.go:615 Beat ID: 68386e1f-0080-4249-ae78-5278a46d79ac
2019-09-04T21:08:38.189+0530 INFO [beat] instance/beat.go:903 Beat info {"system_info": {"beat": {"path": {"config": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "data": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data", "home": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "logs": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/logs"}, "type": "nvidiagpubeat", "uuid": "68386e1f-0080-4249-ae78-5278a46d79ac"}}}
2019-09-04T21:08:38.189+0530 INFO [beat] instance/beat.go:912 Build info {"system_info": {"build": {"commit": "unknown", "libbeat": "7.3.2", "time": "1754-08-30T22:43:41.128Z", "version": "7.3.2"}}}
2019-09-04T21:08:38.189+0530 INFO [beat] instance/beat.go:915 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.12.9"}}}
2019-09-04T21:08:38.192+0530 INFO [beat] instance/beat.go:919 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2019-09-03T20:40:11+05:30","containerized":false,"name":"linux-d4hc","ip":["127.0.0.1/8","::1/128","192.168.29.221/24","2405:201:e806:9f60:29d1:864b:af2b:f9f0/64","2405:201:e806:9f60:7a45:61ff:fec0:c319/64","fe80::7a45:61ff:fec0:c319/64"],"kernel_version":"4.12.14-lp151.27-default","mac":["c8:5b:76:68:99:f7","78:45:61:c0:c3:19"],"os":{"family":"","platform":"opensuse-leap","name":"openSUSE Leap","version":"15.1","major":15,"minor":1,"patch":0},"timezone":"IST","timezone_offset_sec":19800,"id":"1ae32b0454884a1cac7ab936ce597373"}}}
2019-09-04T21:08:38.193+0530 INFO [beat] instance/beat.go:948 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":null,"effective":null,"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "exe": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidiagpubeat", "name": "nvidiagpubeat", "pid": 2913, "ppid": 27508, "seccomp": {"mode":"disabled","no_new_privs":false}, "start_time": "2019-09-04T21:08:37.370+0530"}}}
2019-09-04T21:08:38.193+0530 INFO instance/beat.go:292 Setup Beat: nvidiagpubeat; Version: 7.3.2
2019-09-04T21:08:38.194+0530 DEBUG [beat] instance/beat.go:318 Initializing output plugins
2019-09-04T21:08:38.194+0530 INFO [index-management] idxmgmt/std.go:178 Set output.elasticsearch.index to 'nvidiagpubeat-7.3.2' as ILM is enabled.
2019-09-04T21:08:38.194+0530 INFO elasticsearch/client.go:170 Elasticsearch url: http://localhost:9200
2019-09-04T21:08:38.195+0530 DEBUG [publisher] pipeline/consumer.go:137 start pipeline event consumer
2019-09-04T21:08:38.195+0530 INFO [publisher] pipeline/module.go:97 Beat name: linux-d4hc
2019-09-04T21:08:38.196+0530 INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2019-09-04T21:08:38.196+0530 INFO instance/beat.go:422 nvidiagpubeat start running.
2019-09-04T21:08:38.196+0530 INFO beater/nvidiagpubeat.go:57 nvidiagpubeat is running for ** test ** environment. ! Hit CTRL-C to stop it.
2019-09-04T21:08:39.205+0530 ERROR beater/nvidiagpubeat.go:75 Event not generated, error: Unable to fetch any events from nvidia-smi: Error read |0: file already closed
2019-09-04T21:08:40.207+0530 ERROR beater/nvidiagpubeat.go:75 Event not generated, error: Unable to fetch any events from nvidia-smi: Error read |0: file already closed
^C2019-09-04T21:08:41.178+0530 DEBUG [service] service/service.go:53 Received sigterm/sigint, stopping
2019-09-04T21:08:41.178+0530 DEBUG [publisher] pipeline/client.go:149 client: closing acker
2019-09-04T21:08:41.178+0530 DEBUG [publisher] pipeline/client.go:151 client: done closing acker
2019-09-04T21:08:41.178+0530 DEBUG [publisher] pipeline/client.go:155 client: cancelled 0 events
2019-09-04T21:08:41.184+0530 INFO [monitoring] log/log.go:153 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":24}},"total":{"ticks":60,"time":{"ms":64},"value":60},"user":{"ticks":40,"time":{"ms":40}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":5},"info":{"ephemeral_id":"f413aa9b-3ef6-4b77-998e-6e1f39166bc3","uptime":{"ms":3010}},"memstats":{"gc_next":4194304,"memory_alloc":1289056,"memory_total":3070424,"rss":23531520},"runtime":{"goroutines":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":0}}},"system":{"cpu":{"cores":4},"load":{"1":1.65,"15":2.23,"5":2.1,"norm":{"1":0.4125,"15":0.5575,"5":0.525}}}}}}
2019-09-04T21:08:41.185+0530 INFO [monitoring] log/log.go:154 Uptime: 3.016770154s
2019-09-04T21:08:41.185+0530 INFO [monitoring] log/log.go:131 Stopping metrics logging.
2019-09-04T21:08:41.185+0530 INFO instance/beat.go:432 nvidiagpubeat stopped.
OS: openSUSE Leap 15.1
ashwin@linux-d4hc:~> nvidia-smi
If 'nvidia-smi' is not a typo you can use command-not-found to lookup the package that contains it, like this:
cnf nvidia-smi
ashwin@linux-d4hc:~> ls /dev | grep nvidia | grep -v nvidia-uvm | grep -v nvidiactl | wc -l
0
ashwin@linux-d4hc:~/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat> nvidiagpubeat --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv
Error: unknown flag: --query-gpu
Usage:
nvidiagpubeat [flags]
nvidiagpubeat [command]
Available Commands: export Export current config or index template help Help about any command keystore Manage secrets keystore run Run nvidiagpubeat setup Setup index template, dashboards and ML jobs test Test config version Show current version info
Flags: -E, --E setting=value Configuration overwrite -N, --N Disable actual publishing for testing -c, --c string Configuration file, relative to path.config (default "nvidiagpubeat.yml") --cpuprofile string Write cpu profile to file -d, --d string Enable certain debug selectors -e, --e Log to stderr and disable syslog/file output -h, --help help for nvidiagpubeat --httpprof string Start pprof http server --memprofile string Write memory profile to this file --path.config string Configuration path --path.data string Data path --path.home string Home path --path.logs string Logs path --plugin pluginList Load additional plugins --strict.perms Strict permission checking on config files (default true) -v, --v Log at INFO level
Use "nvidiagpubeat [command] --help" for more information about a command.
I can add checks and throw appropriate error message, if this is the root cause of this issue.
nvidia-smi --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv
Error: unknown flag: --query-gpu```
I noticed this from your logs
2019-09-04T21:08:38.196+0530 INFO instance/beat.go:422 nvidiagpubeat start running.
2019-09-04T21:08:38.196+0530 INFO beater/nvidiagpubeat.go:57 nvidiagpubeat is running for ** test ** environment. ! Hit CTRL-C to stop it.
And you are running on Suse Linux.
https://github.com/eBay/nvidiagpubeat#run-in-test-environment-macos indicates that "test" mode is supported on MacOS. Test mode uses localnvidiasmi and that executable is built on and for MacOS.
Do you want to work on https://github.com/eBay/nvidiagpubeat/issues/25 ? It will fix current issue.
Thanks deepujain. Installing nvidia-smi solved the issue
Can someone help in solving the issue