Open four4fish opened 2 years ago
related warning and logging should happen in accelerator_connector as well.
@four4fish It was initially part of the Accelerator Connector. But we had come across a use case where log_device_info
needed to be overridden with custom logic. Hence, we introduced it as a method in Trainer for this purpose.
@kaushikb11 thank you for sharing the background! Could you share more detail about the use cases? Do you mean user want to override log_device_info()? There was no issue and I couldn't see any details from the PR Is this still required?
Sure! We could have frameworks building on top of Lightning Trainer. For instance, they introduced a new IPEX accelerator and added modifications for it. _log_device_info
would help them customize device info logging as well.
Interesting use case! But seems user extended trainer class with Accelerator() and Plugins selection logic, then the accelerator, plugins and args been passed into super()init, which will call our accelerator_connector()? In that case, the log_device could be in either place? Am I missing something?
Proposed refactor
Raised in discussion by @ananthsub and @justusschock in #11001 1/n Generalize internal checks for Accelerator in Trainer - remove trainer._device_type
Motivation
_log_device_info() in trainer is too verbose and message are not helpful https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L1630-L1661
Accelerator/device selection Only happens in accelerator_connector, and related warning and logging should happen in accelerator_connector as well.
The warning and logic can be merged into select_accelerator_type()
Pitch
Simplify the log warning in trainer._log_device_info() and make it less verbose, remove unnecessary warnings, reduce log level from warning to debug
Move _log_device_info() to accelerator_connector and call at the end of the init(), or merge the logic into accelerator_connector
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @akihironitta @carmocca @edward-io @ananthsub @kaushikb11 @ninginthecloud