Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.93k stars 3.34k forks source link

Cannot pass `schedule` for `PyTorchProfiler` using `LightningCLI` #20074

Open tensorcopy opened 1 month ago

tensorcopy commented 1 month ago

Bug description

In config file, something like this would work:

  profiler:
    class_path: lightning.pytorch.profilers.PyTorchProfiler
    init_args:
      filename: perf_logs
      export_to_chrome: True
    dict_kwargs:
      with_stack: true

dict_kwargs is the recommended way to pass PyTorch profiler args and avoids validation. But for schedule, it needs to be a callable. How should I pass torch.profiler.schedule?

I tried this

  profiler:
    class_path: lightning.pytorch.profilers.PyTorchProfiler
    init_args:
      filename: perf_logs
      export_to_chrome: True
    dict_kwargs:
      schedule: torch.profiler.schedule
      init_args:
        skip_first: 10
        wait: 1
        warmup: 1
        active: 2
        repeat: 1
      with_stack: true

But it complains

Subtypes: (<class 'lightning.pytorch.profilers.profiler.Profiler'>, <class 'str'>, <class 'NoneType'>)
Errors:
  - Schedule should be a callable. Found: torch.profiler.schedule
  - Expected a <class 'str'>
  - Expected a <class 'NoneType'>
Given value type: <class 'jsonargparse._namespace.Namespace'>

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment ``` #- PyTorch Lightning Version (e.g., 1.5.0): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): ```

More info

No response

cc @carmocca @mauvilsa

awaelchli commented 1 month ago

@tensorcopy Shouldn't the init_args under schedule be indented? Otherwise, how would it be able to associate init args to the schedule? Also, for custom objects initialization, I think you always need to provide class_path.

awaelchli commented 1 month ago

If I'm lucky without trying it, this might work:

profiler:
    class_path: lightning.pytorch.profilers.PyTorchProfiler
    init_args:
        filename: perf_logs
        export_to_chrome: True
    dict_kwargs:
        schedule: 
            class_path: torch.profiler.schedule
            init_args:
                skip_first: 10
                wait: 1
                warmup: 1
                active: 2
                repeat: 1
        with_stack: true
tensorcopy commented 1 month ago

Thanks for taking a look! Unfortunately

  profiler:
    class_path: lightning.pytorch.profilers.PyTorchProfiler
    init_args:
      filename: perf_logs
      export_to_chrome: True
    dict_kwargs:
      schedule:
        class_path: torch.profiler.schedule
        init_args:
          skip_first: 10
          wait: 1
          warmup: 1
          active: 2
          repeat: 1
      with_stack: true

throws this

ValueError: Does not validate against any of the Union subtypes
Subtypes: (<class 'lightning.pytorch.profilers.profiler.Profiler'>, <class 'str'>, <class 'NoneType'>)
Errors:
  - Schedule should be a callable. Found: {'class_path': 'torch.profiler.schedule', 'init_args': {'skip_first': 10, 'wait': 1, 'warmup': 1, 'active': 2, 'repeat': 1}}
  - Expected a <class 'str'>
  - Expected a <class 'NoneType'>
Given value type: <class 'jsonargparse._namespace.Namespace'>
Given value: Namespace(class_path='lightning.pytorch.profilers.PyTorchProfiler', init_args=Namespace(dirpath=None, filename='perf_logs', group_by_input_shapes=False, emit_nvtx=False, export_to_chrome=True, row_limit=20, sort_by_key=None, record_module_names=True, table_kwargs=None, record_shapes=False, use_cuda=False))
mauvilsa commented 1 month ago

torch.profiler.schedule is not a class. Unfortunately this is currently not supported.

mauvilsa commented 1 month ago

To make this work currently you can use a wrapper class like:

from torch.profiler import schedule

class Schedule:
    def __init__(self, *args, **kwargs):
        self.schedule = schedule(*args, **kwargs)

    def __call__(self, *args, **kwargs):
        return self.schedule(*args, **kwargs)

then in the config

schedule:
  class_path: your.module.Schedule
  init_args:
    skip_first: 10
    wait: 1
    warmup: 1
    active: 2
    repeat: 1
tensorcopy commented 1 month ago

Thanks @mauvilsa! Yeah I actually tried this approach before but it still has issues. Seems like there are some special requirement around dict_kwargs. It only expects a string. Without setting schedule, with_stack: true works as expected.

With

  profiler:
    class_path: lightning.pytorch.profilers.PyTorchProfiler
    init_args:
      filename: perf_logs
      export_to_chrome: True
    dict_kwargs:
      schedule:
        class_path: common.Schedule
        init_args:
          skip_first: 10
          wait: 1
          warmup: 1
          active: 2
          repeat: 1
      with_stack: true

it throws

ValueError: Does not validate against any of the Union subtypes
Subtypes: (<class 'lightning.pytorch.profilers.profiler.Profiler'>, <class 'str'>, <class 'NoneType'>)
Errors:
  - Schedule should be a callable. Found: {'class_path': 'common.Schedule', 'init_args': {'skip_first': 10, 'wait': 1, 'warmup': 1, 'active': 2, 'repeat': 1}}
  - Expected a <class 'str'>
  - Expected a <class 'NoneType'>
Given value type: <class 'jsonargparse._namespace.Namespace'>
Given value: Namespace(class_path='lightning.pytorch.profilers.PyTorchProfiler', init_args=Namespace(dirpath=None, filename='perf_logs', group_by_input_shapes=False, emit_nvtx=False, export_to_chrome=True, row_limit=20, sort_by_key=None, record_module_names=True, table_kwargs=None, record_shapes=False, use_cuda=False))