facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
MIT License
260 stars 38 forks source link

add GPU process parsing from nvidia-smi #66

Closed haowangludx closed 1 year ago

haowangludx commented 1 year ago

Summary: Added the API to read pid of GPU processes, as DCGM cannot read pid properly. This will be used to identify running process -> find process workflow environment metadata

The API returns a list of pids running on the GPU, with index being the GPU id, -1 means no process is running on that GPU.

Reviewed By: jj10306

Differential Revision: D41561765

LaMa Project: L1137347

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D41561765

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D41561765