NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
404 stars 52 forks source link

Error: unable to establish a connection to the specified host: localhost #43

Closed hyoonseo159357 closed 2 years ago

hyoonseo159357 commented 2 years ago

I'm using aws and my environment is:

and install DCGM refer to this link https://developer.nvidia.com/dcgm (The version was modified from ubuntu20 to 18 and installed)

after install DCGM, When I enter that statement, I get the following error

dcgmi discovery -l

Error: unable to establish a connection to the specified host: localhost Error: Unable to connect to host engine. Host engine connection invalid/disconnected.

dcgmi discovery -v

Version : 2.4.5 Build ID : 9 Build Date : 2022-06-03 Build Type : Release Commit ID : 82470ec91c4a20565182d65d2b8f0ea756c70285 Branch Name : rel_dcgm_2_4 CPU Arch : x86_64 Build Platform : Linux 4.15.0-180-generic #189-Ubuntu SMP Wed May 18 14:13:57 UTC 2022 x86_64 CRC : 54832e64be3a6a8ad586bcae022ca6cb

hyoonseo159357 commented 2 years ago

The problem was caused by running it on a g4ad instance, not g4dn...