Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.51k stars 3.39k forks source link

Refactor trainer._log_device_info() method and warnings #11014

Open four4fish opened 2 years ago

four4fish commented 2 years ago

Proposed refactor

Raised in discussion by @ananthsub and @justusschock in #11001 1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type

Motivation

_log_device_info() in trainer is too verbose and message are not helpful https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L1630-L1661

Accelerator/device selection Only happens in accelerator_connector, and related warning and logging should happen in accelerator_connector as well.

The warning and logic can be merged into select_accelerator_type()

Pitch

Simplify the log warning in trainer._log_device_info() and make it less verbose, remove unnecessary warnings, reduce log level from warning to debug

Move _log_device_info() to accelerator_connector and call at the end of the init(), or merge the logic into accelerator_connector

Additional context


If you enjoy Lightning, check out our other projects! ⚡

cc @justusschock @awaelchli @akihironitta @carmocca @edward-io @ananthsub @kaushikb11 @ninginthecloud

kaushikb11 commented 2 years ago

related warning and logging should happen in accelerator_connector as well.

@four4fish It was initially part of the Accelerator Connector. But we had come across a use case where log_device_info needed to be overridden with custom logic. Hence, we introduced it as a method in Trainer for this purpose.

four4fish commented 2 years ago

@kaushikb11 thank you for sharing the background! Could you share more detail about the use cases? Do you mean user want to override log_device_info()? There was no issue and I couldn't see any details from the PR Is this still required?

kaushikb11 commented 2 years ago

Sure! We could have frameworks building on top of Lightning Trainer. For instance, they introduced a new IPEX accelerator and added modifications for it. _log_device_info would help them customize device info logging as well.

four4fish commented 2 years ago

Interesting use case! But seems user extended trainer class with Accelerator() and Plugins selection logic, then the accelerator, plugins and args been passed into super()init, which will call our accelerator_connector()? In that case, the log_device could be in either place? Am I missing something?