Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.47k stars 3.39k forks source link

Core Trainer Connectors #10417

Closed awaelchli closed 1 year ago

awaelchli commented 3 years ago

Proposed refactoring or deprecation

Reduce the number of connectors the Trainer relies on to only three core ones:

Motivation

As part of the Lightning API audit led by @ananthsub + co., we proposed already several simplifications and code quality improvements to connectors in #7493, #7654, #9778, #10161, #10119, #10108, #10110 etc. There are still a few connectors that are problematic for several reasons.

These three properties make most connectors a burden to maintain as they just obscure the fact that Trainer remains a too powerful class.

Pitch

Remove (refactor away) all connectors except the core ones:

We (@awaelchli @daniellepintz + co) believe that the fact they have enough complexity and encapsulate responsibility warrants their existence as standalone classes. Hence, we formulate these goals:

  1. Simplify and document the three core connectors listed above
  2. Remove Trainer references
  3. Arrange ownership of components: LoggerConnector should own the logger instance, AcceleratorConnector should own accelerator instance, etc.
  4. Refactor away all others such that their logic lives in the Trainer directly.

Additional context

There are a great many similarities between the "DataLoadingMixin" and the DataConnector. As the "DataLoadingMixin" is not a true mixin and we are aiming at removing the "mixins" from the Trainer completely, the DataConnector will be a natural choice for where this logic can go.


If you enjoy Lightning, check out our other projects! ⚡

cc @justusschock @awaelchli @akihironitta @rohitgr7 @kaushikb11 @ninginthecloud

tchaton commented 3 years ago

Hey @awaelchli,

I know there is a strong push to remove the connectors to a minimal amount and I don't like this effort. @williamFalcon introduced the connectors in the first hand to make the Trainer approachable to new readers and contributors. The goal was to make the highest layer of Lightning the cleanest possible.

IMO, the Trainer code right now is getting more complex than it used to be: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L447 and I have seen tweets about Lightning becoming unreadable.

I would prefer for us to come with a better approach to organise the code instead of dumping everything on the Trainer class and making it un-readable.