PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 67 forks source link

VRAM memory requirements for lora / finetuning? #89

Open GavChap opened 1 month ago

GavChap commented 1 month ago

What are the VRAM and RAM requirements for training a lora and finetuning?

Is there a way to offload T5 from VRAM / precalculate it to be able to finetune / lora train on something with 16Gb VRAM?

lawrence-cj commented 1 month ago

Of course. We suggest users extract the features of T5 and VAE offline. In that way, the training for LoRA can be done on VRAM less than 24GB. Refer to: https://github.com/PixArt-alpha/PixArt-sigma/blob/master/asset/docs/data_feature_extraction.md

a-One-Fan commented 1 month ago

@lawrence-cj Hello, trying to train a Lora on the toy dataset (which has the features already) fails for me. It fails to find a "text" column. Furthermore, putting a print at line 647, it has not loaded anything other than images: DatasetDict({ train: Dataset({ features: ['image'], num_rows: 96})}) Searching for vae in the train_pixart_lora_hf.py file (which the guide launches) shows me that the VAE is always loaded to the accelerator device and always used to encode, so even if the features were loaded they would be ignored and VRAM still used? Log, accelerate command at bottom: toy_lora_train_log.txt

The CPU device is likely because I have no CUDA devices. I will manually be replacing some "cuda" strings with my 16GB Intel "xpu" device later. In either case, pip list

Is train_pixart_lora_hf the right file? Looking at its code more, I feel like it's not. Is training a lora with the features extracted actually supported? Did I make some massive mistake?

eeyrw commented 3 weeks ago

Of course. We suggest users extract the features of T5 and VAE offline. In that way, the training for LoRA can be done on VRAM less than 24GB. Refer to: https://github.com/PixArt-alpha/PixArt-sigma/blob/master/asset/docs/data_feature_extraction.md

How much VRAM for LoRA/full finetune is required if extracting features by T5 and VAE in runtime? Then I can rent a proper GPU cloud instance.

lawrence-cj commented 6 days ago

12GB is enough I suppose.