Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.96k stars 3.35k forks source link

Support mosaic optimizations as plugins #12360

Open williamFalcon opened 2 years ago

williamFalcon commented 2 years ago

This library mosaic has neat tricks for optimizing the models for faster training. Each application is done as a single line to the model

import composer.functional as cf
from torchvision import models

my_model = models.resnet18()

# add blurpool and squeeze excite layers
model = cf.apply_blurpool(my_model)
model = cf.apply_squeeze_excite(my_model)

# your own training code starts here

Which is something we can automatically do for users under the hood if they want to enable the mosaic optimizations.

I propose an API like this

import pytorch_lightning as pl

trainer = pl.Trainer(plugins=[
    mosaic.BlurPool(replace_convs=True, replace_maxpools=True, blur_first=True),
    mosaic.ChannelsLast(),
    mosaic.CutMix(num_classes=10),
    mosaic.LabelSmoothing(smoothing=0.1),
])

cc @borda @akihironitta @Borda @carmocca @tchaton

ananthsub commented 2 years ago

Why do these need to be plugins? users can leverage these directly within their LightningModules

carmocca commented 2 years ago

I believe the proposed API just mimics what composer offers:

trainer = composer.Trainer(
    ...
    algorithms=[
        BlurPool(replace_convs=True, replace_maxpools=True, blur_first=True),
        ChannelsLast(),
        CutMix(num_classes=10),
        LabelSmoothing(smoothing=0.1),
    ]
)

(from their README)

The key difference here is that (AFAIK) composer does not provide a "Module" abstraction such as the LightningModule so it is natural for their library to put these directly in their trainer.

Also, looking at the source code, it does not look like they do any special management of these and just run all algorithms during every trigger event:

https://github.com/mosaicml/composer/blob/42271f8f6b10810d660318d17d037822beb05ee7/composer/core/engine.py#L177-L185 https://github.com/mosaicml/composer/blob/42271f8f6b10810d660318d17d037822beb05ee7/composer/core/engine.py#L192-L196

These could be directly part of the LightningModule especially because they are part of the research and that's the where you usually put your research code. So unless we need to integrate with strategies or something specific to our internals, I agree with @ananthsub

carmocca commented 2 years ago

@hanlint do you forsee any limitations that would be better adressed by making the algorithms part of the Trainer?

Some "algorithms" may require overriding several hooks and managing state, but those could just be Callbacks that the user passes to the Trainer.

hanlint commented 2 years ago

@carmocca agreed with the approach. We designed the functional API so that users can utilize our methods inside their own training loops, so would be natural for users wishing to employ our efficiency methods to put in LightningModule themselves, where the research code lives.

A few limitations I can think of:

However, most of these are ease-of-use items that could be handled with good docs and warnings (e.g. we recently added algorithm warnings in https://github.com/mosaicml/composer/pull/720), as we harden our functional API.

tchaton commented 2 years ago

Hey @hanlint I wanted to introduce you to Lightning Flash

In Flash, we have 2 custom objects there: Input and InputTransform used to organize data loading and data transform + the concept of Adapter (learn2learn example) to easily integrate thrid-party libraries.

Here is the API for learn2learn integration for example

I believe we could explore a Flash integration first, understand how we can address some of the limitations raised above, and upstream any core components which work for all users.

hanlint commented 2 years ago

Thanks @tchaton for the pointer, I will take a look.