Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.96k stars 3.35k forks source link

Train diffusion model with fabric #19042

Closed caiqi closed 9 months ago

caiqi commented 9 months ago

Bug description

I want to train a text to image diffusion model using lighting. There are mainly three models, a text encoder, a vae and a unet.

I setup them with

fabric.setup(unet, unet_optimizer)
fabric.setup(text_encoder)
fabric.setup(vae)

It works fine with auto strategy. However, when switch to deepspeed strategy, it reports AssertionError: zero stage 2 requires an optimizer

Is there any solution for such setting when there are mulitple models that do not need training at all?

What version are you seeing the problem on?

v2.1

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment ``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```

More info

No response

cc @awaelchli @carmocca @justusschock

carmocca commented 9 months ago

Can you share the complete error stacktrace?

caiqi commented 9 months ago

Can you share the complete error stacktrace?

@carmocca Thanks! This is a reproducing code:

import lightning as L
import torch.nn as nn
import torch.optim

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Conv2d(3, 128, 3, 3)

def main():
    fabric = L.Fabric(
        strategy="deepspeed"
    )
    fabric.launch()
    net_a = Network()
    net_b = Network()
    optim_a = torch.optim.Adam(net_a.parameters())
    net_a, optim_a = fabric.setup(net_a, optim_a)
    net_b = fabric.setup(net_b)
    return

if __name__ == '__main__':
    main()

and this is the full stacktrace:

Traceback (most recent call last):
  File "/mnt/afs_xxxxx/project/dreamdata/Model/dreamodel/utils/debug_deepspeed.py", line 25, in <module>
    main()
  File "/mnt/afs_xxxxx/project/dreamdata/Model/dreamodel/utils/debug_deepspeed.py", line 21, in main
    net_b = fabric.setup(net_b)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 242, in setup
    module = self._strategy.setup_module(module)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/strategies/deepspeed.py", line 337, in setup_module
    self._deepspeed_engine, _ = self._initialize_engine(module)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/strategies/deepspeed.py", line 584, in _initialize_engine
    deepspeed_engine, deepspeed_optimizer, _, _ = deepspeed.initialize(
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/__init__.py", line 171, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
    self.optimizer = self._configure_zero_optimizer(optimizer=None)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1465, in _configure_zero_optimizer
    assert not isinstance(optimizer, DummyOptim), "zero stage {} requires an optimizer".format(zero_stage)
AssertionError: zero stage 2 requires an optimizer
Traceback (most recent call last):
  File "/mnt/afs_xxxxx/project/dreamdata/Model/dreamodel/utils/debug_deepspeed.py", line 25, in <module>
    main()
  File "/mnt/afs_xxxxx/project/dreamdata/Model/dreamodel/utils/debug_deepspeed.py", line 21, in main
    net_b = fabric.setup(net_b)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 242, in setup
    module = self._strategy.setup_module(module)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/strategies/deepspeed.py", line 337, in setup_module
    self._deepspeed_engine, _ = self._initialize_engine(module)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/lightning/fabric/strategies/deepspeed.py", line 584, in _initialize_engine
    deepspeed_engine, deepspeed_optimizer, _, _ = deepspeed.initialize(
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/__init__.py", line 171, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 309, in __init__
    self.optimizer = self._configure_zero_optimizer(optimizer=None)
  File "/mnt/afs_xxxxx/anaconda3/envs/pytorch_latest/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1465, in _configure_zero_optimizer
    assert not isinstance(optimizer, DummyOptim), "zero stage {} requires an optimizer".format(zero_stage)
AssertionError: zero stage 2 requires an optimizer

My scenario is that net_b is freezed and only used to extract features for net_a, like the vae in text to image diffusion.

awaelchli commented 9 months ago

Hey @caiqi So the default deepspeed stage is ZeRO 2. This is a method specific to training, and therefore requires an optimizer. Setting up the model without one would not make any sense. As the error says zero stage 2 requires an optimizer.

You didn't mention what your intention is in your post. Maybe you just want to do inference and your model is too large. In this case, only stage 3 make sense then:

fabric = Fabric(... strategy="deepspeed_stage_3")

# or

from lightning.fabric.strategies import DeepSpeedStrategy
fabric = Fabric(... strategy=DeepSpeedStrategy(stage=3))

There are of course other ways to make inference more efficient in terms of memory, for example 16-bit, quantization, etc which you should try first before reaching to deepspeed/multi-gpu.

Please let me know if this helps.

caiqi commented 9 months ago

@carmocca Thanks! My intention is to train text to image diffusion model using pytorch lightning. The typical text to image diffusion contains a text encoder and a vae which are freezed during training and a unet which is used for training. So my focus is on training and not on inference. From the code, I need to setup several fabric instances, one for text encoder and vae, using deepspeed stage 3 and one for unet, using deepspeed stage 2? Does it need to call launch for each fabric instance? Thanks!

awaelchli commented 9 months ago

In theory, if you want to use multiple models with different deepspeed stages, then yes, you'll need one Fabric for each. However, I think that wouldn't be necessary in your case, and you could just start by using stage 2. Here is the recipe:

fabric = Fabric(strategy="deepspeed_stage_2", ...)
fabric.launch()

text_encoder = TextEncoder()
vae = VAE()
unet = Unet()

# You only want to train unet
unet, optimizer = fabric.setup(unet, optimizer)

# Freeze the others
text_encoder.eval()
text_encoder.requires_grad(False)
vae.eval()
vae.requires_grad(False)

# Move the frozen ones to GPU, and use them as-is
text_encoder.to(fabric.device)
vae.to(fabric.device)

# Train
...
caiqi commented 9 months ago

In theory, if you want to use multiple models with different deepspeed stages, then yes, you'll need one Fabric for each. However, I think that wouldn't be necessary in your case, and you could just start by using stage 2. Here is the recipe:

fabric = Fabric(strategy="deepspeed_stage_2", ...)
fabric.launch()

text_encoder = TextEncoder()
vae = VAE()
unet = Unet()

# You only want to train unet
unet, optimizer = fabric.setup(unet, optimizer)

# Freeze the others
text_encoder.eval()
text_encoder.requires_grad(False)
vae.eval()
vae.requires_grad(False)

# Move the frozen ones to GPU, and use them as-is
text_encoder.to(fabric.device)
vae.to(fabric.device)

# Train
...

Thanks. This works fine.

rob-hen commented 7 months ago
  1. Is it possible to do the same, but using Pytorch Lightning? Loading only the unet with DeepSpeed, and not the other models?
  2. Would the following be possible in Pytorch lightning?

class Model(pl.LightningModule)

def init(self): self.unet = Unet() self.unet.requires_grad(True) self.vae = Vae() self.vae.requires_grad(False) self.vae.eval()


Now pytorch lightning could check if there is a model where for each parameter `p` , `p.requires_grad==False `holds. In that case, do not wrap it with DeepSpeed. Otherwise, use DeepSpeed.