XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
https://nvitop.readthedocs.io
Apache License 2.0
4.56k stars 144 forks source link

[Question] How snapshot could be used #119

Closed Xiang-cd closed 6 months ago

Xiang-cd commented 7 months ago

Required prerequisites

Questions

thank you for your great work! I've seen the as_snapshot function, but I was wondering how the snapshot could be used? is the snapshot resumable? because I didn't see the resume interface.

XuehaiPan commented 7 months ago

@Xiang-cd Thanks for raising this.

The Device / GpuProcess instance is live object to get the latest metrics via method calls. E.g.:

device = Device(0)

device.gpu_utilization()  # -> 96
time.sleep(1)
device.gpu_utilization()  # -> 80  # the latest value from a new NVML API call

Once it is converted to snapshot, the metrics are forzen. The metrcis values are obtained at the time you call as_snapshot().

device = Device(0)

snapshot = device.as_snapshot()
snapshot.gpu_utilization  # -> 94
time.sleep(1)
snapshot.gpu_utilization  # -> 94 (always the freezed value)

You can access the device object via:

snapshot = device.as_snapshot()

snapshot.real  # -> Device(...)

# Get a new snapshot
snapshot = snapshot.real.as_snapshot()
# or
snapshot = device.as_snapshot()
Xiang-cd commented 6 months ago

thank you, so the snapshot is not a context containing all memory content and register status that could move all context of one device to another device?