XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
https://nvitop.readthedocs.io
Apache License 2.0
4.56k stars 144 forks source link

[BUG] Pytorch lightning callback #112

Closed marios1861 closed 8 months ago

marios1861 commented 8 months ago

Required prerequisites

What version of nvitop are you using?

1.3.1

Operating system and version

Ubuntu 20.04.6 LTS

NVIDIA driver version

535.129.03

NVIDIA-SMI

No response

Python environment

poetry environment

Problem description

When using pytorch lightning callback, model_helpers function in lightning.pytorch.utilities raises ValueError("Expected a parent"), even though

elif isinstance(instance, pl.Callback):
  parent = pl.Callback

should be the chosen branch, isinstance(instance, pl.Callback) returns false. After further inspection, the older package pytorch_lightning is being used, which can be replaced without any further changes with lightning.pytorch.

Steps to Reproduce

gpu_stats = GpuStatsLogger()
trainer = pl.Trainer(logger=etc, callbacks=[gpu_stats])

Traceback

Traceback (most recent call last):
  File "...", line 40, in <module>
    trainer = pl.Trainer(..., callbacks=[callback])
  File "/.../lib/python3.8/site-packages/lightning/pytorch/utilities/argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
  File "/.../lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 431, in __init__
    self._callback_connector.on_trainer_init(
  File "/.../lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 79, in on_trainer_init
    _validate_callbacks_list(self.trainer.callbacks)
  File "/.../lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 227, in _validate_callbacks_list
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/.../lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 227, in <listcomp>
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/.../lib/python3.8/site-packages/lightning/pytorch/utilities/model_helpers.py", line 39, in is_overridden
    raise ValueError("Expected a parent")

Logs

No response

Expected behavior

No response

Additional context

No response

XuehaiPan commented 8 months ago

Hi @marios1861, does lightning.pytorch.callbacks.DeviceStatsMonitor fit your use case? The callback in nvitop has not been updated for years. I would rather mark it as deprecated. Because we don't have a release schedule that's aligned with Lightning-AI/lightning. See my previous discussion in https://github.com/XuehaiPan/nvitop/pull/84#issuecomment-1663400377.

XuehaiPan commented 8 months ago

When using pytorch lightning callback, model_helpers function in lightning.pytorch.utilities raises ValueError("Expected a parent"), even though

elif isinstance(instance, pl.Callback):
  parent = pl.Callback

should be the chosen branch, isinstance(instance, pl.Callback) returns false. After further inspection, the older package pytorch_lightning is being used, which can be replaced without any further changes with lightning.pytorch.

@marios1861 I guess you are using lightning.pytorch rather than pytorch_lightning, right?

marios1861 commented 8 months ago

Hello @XuehaiPan, yes I am using lightning.pytorch (the package was renamed). lightning.pytorch.callbacks.DeviceStatsMonitor is fine as well, but with adaptor modules (nvitop - lightning in this case) it's not very clear where the responsibility of implementation lies. Since lightning provides the base class for implementing callbacks, I would provide the implementation in nvitop. If that isn't your decision that's fine too.