Open GavChap opened 1 month ago
Of course. We suggest users extract the features of T5 and VAE offline. In that way, the training for LoRA can be done on VRAM less than 24GB. Refer to: https://github.com/PixArt-alpha/PixArt-sigma/blob/master/asset/docs/data_feature_extraction.md
@lawrence-cj Hello, trying to train a Lora on the toy dataset (which has the features already) fails for me.
It fails to find a "text" column. Furthermore, putting a print at line 647, it has not loaded anything other than images: DatasetDict({ train: Dataset({ features: ['image'], num_rows: 96})})
Searching for vae
in the train_pixart_lora_hf.py file (which the guide launches) shows me that the VAE is always loaded to the accelerator device and always used to encode, so even if the features were loaded they would be ignored and VRAM still used?
Log, accelerate command at bottom:
toy_lora_train_log.txt
The CPU device is likely because I have no CUDA devices. I will manually be replacing some "cuda" strings with my 16GB Intel "xpu" device later. In either case, pip list
Is train_pixart_lora_hf the right file? Looking at its code more, I feel like it's not. Is training a lora with the features extracted actually supported? Did I make some massive mistake?
Of course. We suggest users extract the features of T5 and VAE offline. In that way, the training for LoRA can be done on VRAM less than 24GB. Refer to: https://github.com/PixArt-alpha/PixArt-sigma/blob/master/asset/docs/data_feature_extraction.md
How much VRAM for LoRA/full finetune is required if extracting features by T5 and VAE in runtime? Then I can rent a proper GPU cloud instance.
12GB is enough I suppose.
What are the VRAM and RAM requirements for training a lora and finetuning?
Is there a way to offload T5 from VRAM / precalculate it to be able to finetune / lora train on something with 16Gb VRAM?